You might remember that in my last post about the Ubuntu debuginfod
service I talked about wanting to extend it and make it index and
serve source code from packages. I m excited to announce that this is
now a reality since the Ubuntu Lunar (23.04) release.
The feature should work for a lot of packages from the archive, but
not all of them. Keep reading to better understand why.
The problem
While debugging a package in Ubuntu, one of the first steps you need
to take is to install its source code. There are some problems with
this:
apt-get source required dpkg-dev to be installed, which ends up
pulling in a lot of other dependencies.
GDB needs to be taught how to find the source code for the package
being debugged. This can usually be done by using the dir
command, but finding the proper path to be is usually not trivial,
and you find yourself having to use more complex commands like
set substitute-path, for example.
You have to make sure that the version of the source package is the
same as the version of the binary package(s) you want to debug.
If you want to debug the libraries that the package links against,
you will face the same problems described above for each library.
So yeah, not a trivial/pleasant task after all.
The solution
Debuginfod can index source code as well as debug symbols. It is
smart enough to keep a relationship between the source package and the
corresponding binary s Build-ID, which is what GDB will use when
making a request for a specific source file. This means that, just
like what happens for debug symbol files, the user does not need to
keep track of the source package version.
While indexing source code, debuginfod will also maintain a record of
the relative pathname of each source file. No more fiddling with
paths inside the debugger to get things working properly.
Last, but not least, if there s a need for a library source file and
if it s indexed by debuginfod, then it will get downloaded
automatically as well.
but not a perfect one
In order to make debuginfod happy when indexing source files, I had to
patch dpkg and make it always use -fdebug-prefix-map when
compiling stuff. This GCC option is used to remap pathnames inside
the DWARF, which is needed because in Debian/Ubuntu we build our
packages inside chroots and the build directories end up containing a
bunch of random cruft (like /build/ayusd-ASDSEA/something/here). So
we need to make sure the path prefix (the /build/ayusd-ASDSEA part)
is uniform across all packages, and that s where -fdebug-prefix-map
helps.
This means that the package must honour dpkg-buildflags during its
build process, otherwise the magic flag won t be passed and your DWARF
will end up with bogus paths. This should not be a big problem,
because most of our packages do honour dpkg-buildflags, and those
who don t should be fixed anyway.
especially if you re using LTO
Ubuntu enables LTO by default, and unfortunately we are affected by an
annoying (and complex) bug that results in those bogus pathnames not
being properly remapped. The bug doesn t affect all packages, but if
you see GDB having trouble finding a source file whose full path
starts without /usr/src/..., that is a good indication that you re
being affected by this bug. Hopefully we should see some progress in
the following weeks.
Your feedback is important to us
If you have any comments, or if you found something strange that looks
like a bug in the service, please reach out. You can either send an
email to my public inbox (see below) or file a bug against the
ubuntu-debuginfod project on Launchpad.
Posted on May 12, 2023
Even when it s \m/.
Years ago I watched my SO play Br tal Legend and of course loved it, but I ve been only using used computers for a long time, and none of them was really able to run modern games.
Admittedly, he told me that I could use his computer to play the game while he wasn t home (and I do have an account on that computer, that I ve sporadically used to do computationally intensive stuff, but always remotely), but it was a hassle, and I never did.
This year, however, he gifted me a shiny new CPU and motherboard, and among other things that meant games from this century!
The first thing I ve spent time on was 0ad (which admittedly already worked on one of the old computers, as long as the map wasn t too big), but now it was time to play basically the one recent proprietary game I had been wanting to play.
So, this afternoon I started by trying to copy the installer (it was bought from an humble bundle, I don t have steam) from the home server to my PC, and the home server froze. Ok, I could copy it through something else than git annex (or from the offline hard disk backup, as I did).
Then I tried to run the installer, which resulted in the really helpful error message:
bash: ./BrutalLegend-Linux-2013-05-07-setup.bin: cannot execute: required file not found
ok, then surely ldd can help:
not a dynamic executable
maybe it doesn t like being a symlink (remember, git annex), but no, that wasn t the problem. ah! maybe file can help, and indeed:
argh. Why does proprietary software hate us?
Oh, well, https://wiki.debian.org/Multiarch/HOWTO , dpkg --add-architecture i386 followed by apt update and apt install libc6-i386 and the installer started.
Of course this didn t mean that the game could run, but at least it was spitting out the right error messages, and I could quickly see what the other missing packages were:
and the game started!
and
no. audio.
I often play games with no audio, because I can t wear headphones, but here the soundtrack is basically 50% of the reason one would play this game.
Back when my SO had played the game audio was still through pulseaudio, while now I m using pipewire (and I wasn t sure that the game wasn t old enough to be wanting to use alsa), so I started to worry a bit.
And this time, there was no error message to help, but some googling (on searx) and trial and error gave me this list of packages:
and that was it! the game started AND I could hear music!
And then it was time for dinner, and I couldn t play.
(You may notice that this post has been posted quite some time after dinner. Most of this time wasn t spent writing the post.)
Anyway, as soon as I ve defeated and crushed Doviculus I m going back to 0ad. or maybe wesnoth. or some other Free Software and frustration-free game.
DEP-17 progress, by Helmut and Emilio
We posted a proposal for modifying dpkg to better cope with directory aliasing.
After an initial period of silence, the discussion took off, but was mostly
diverted to a competing proposal by Luca Boccassi: Do not change dpkg at all,
but still move all files affected by aliasing to their canonical location and
thus removing the bad effects of aliasing. We facilitated this discussion and
performed extensive analysis of this and competing proposals highlighting
resulting problems and proposing solutions or workarounds. We performed a
detailed analysis of how aliasing affects usage of dpkg-divert,
dpkg-statoverride and update-alternatives. Details are available on the
debian-dpkg mailinglist thread.
Debian Reimbursements Web App Progress, by Stefano Rivera
In a project funded by
Freexian s Project Funding initiative,
Stefano made some more progress on the
Debian Reimbursements Web App.
The full workflow can now be exercised, completing the first milestone of the
project, the Working Prototype.
Stefano attended several DebConf planning meetings, and did some work on the
DebConf 23 website.
Stefano updated distro-info-data to include the release date of Debian
bullseye, and added the next Ubuntu release, Mantic Minotour. This required a
round of updates to all the stable releases, LTS, and ELTS.
Helmut sent patches for 13 cross build failures and filed 104 RC bugs for
missing Breaks and Replaces.
As suggested in my initial announcement of apt-sigstore my plan was to look into stronger uses of Sigstore than rekor, and I m now happy to announce that the apt-cosign plugin has been added to apt-sigstore and the operational project debdistcanary is publishing cosign-statements about the InRelease file published by the following distributions: Trisquel GNU/Linux, PureOS, Gnuinos, Ubuntu, Debian and Devuan.
Summarizing the commands that you need to run as root to experience the great new world:
Then run your usual apt-get update and look in the syslog to debug things.
This is the kind of work that gets done while waiting for the build machines to attempt to reproducibly build PureOS. Unfortunately, the results is that a meager 16% of the 765 added/modifed packages are reproducible by me. There is some infrastructure work to be done to improve things: we should use sbuild for example. The build infrastructure should produce signed statements for each package it builds: One statement saying that it attempted to reproducible build a particular binary package (thus generated some build logs and diffoscope-output for auditing), and one statements saying that it actually was able to reproduce a package. Verifying such claims during apt-get install or possibly dpkg -i is a logical next step.
There is some code cleanups and release work to be done now. Which distribution will be the first apt-based distribution that includes native support for Sigstore? Let s see.
Sigstore is not the only relevant transparency log around, and I ve been trying to learn a bit about Sigsum to be able to support it as well. The more improved confidence about system security, the merrier!
I recently bought a Banana Pi BPI-M5, which uses the Amlogic S905X3 SoC: these are my notes about installing Debian on it.
While this SoC is supported by the upstream U-Boot it is not supported by the Debian U-Boot package, so debian-installer does not work. Do not be fooled by seeing the DTB file for this exact board being distributed with debian-installer: all DTB files are, and it does not mean that the board is supposed to work.
As I documented in #1033504, the Debian kernels are currently missing some patches needed to support the SD card reader.
I started by downloading an Armbian Banana Pi image and booted it from an SD card. From there I partitioned the eMMC, which always appears as /dev/mmcblk1:
Make sure to leave enough space before the first partition, or else U-Boot will overwrite it: as it is common for many ARM SoCs, U-Boot lives somewhere in the gap between the MBR and the first partition.
I looked at Armbian's /usr/lib/u-boot/platform_install.sh and installed U-Boot by manually copying it to the eMMC:
I wanted to have a fully working flash-kernel, so I used Armbian's boot.scr as a template to create /etc/flash-kernel/bootscript/bootscr.meson and then added a custom entry for the Banana Pi to /etc/flash-kernel/db:
All things considered I do not think that I would recommend to Debian users to buy Amlogic-based boards since there are many other better supported SoCs.
I ve used hardware-backed OpenPGP keys since 2006 when I imported newly generated rsa1024 subkeys to a FSFE Fellowship card. This worked well for several years, and I recall buying more ZeitControl cards for multi-machine usage and backup purposes. As a side note, I recall being unsatisfied with the weak 1024-bit RSA subkeys at the time my primary key was a somewhat stronger 1280-bit RSA key created back in 2002 but OpenPGP cards at the time didn t support more than 1024 bit RSA, and were (and still often are) also limited to power-of-two RSA key sizes which I dislike.
I had my master key on disk with a strong password for a while, mostly to refresh expiration time of the subkeys and to sign other s OpenPGP keys. At some point I stopped carrying around encrypted copies of my master key. That was my main setup when I migrated to a new stronger RSA 3744 bit key with rsa2048 subkeys on a YubiKey NEO back in 2014. At that point, signing other s OpenPGP keys was a rare enough occurrence that I settled with bringing out my offline machine to perform this operation, transferring the public key to sign on USB sticks. In 2019 I re-evaluated my OpenPGP setup and ended up creating a offline Ed25519 key with subkeys on a FST-01G running Gnuk. My approach for signing other s OpenPGP keys were still to bring out my offline machine and sign things using the master secret using USB sticks for storage and transport. Which meant I almost never did that, because it took too much effort. So my 2019-era Ed25519 key still only has a handful of signatures on it, since I had essentially stopped signing other s keys which is the traditional way of getting signatures in return.
None of this caused any critical problem for me because I continued to use my old 2014-era RSA3744 key in parallel with my new 2019-era Ed25519 key, since too many systems didn t handle Ed25519. However, during 2022 this changed, and the only remaining environment that I still used my RSA3744 key for was in Debian and they require OpenPGP signatures on the new key to allow it to replace an older key. I was in denial about this sub-optimal solution during 2022 and endured its practical consequences, having to use the YubiKey NEO (which I had replaced with a permanently inserted YubiKey Nano at some point) for Debian-related purposes alone.
In December 2022 I bought a new laptop and setup a FST-01SZ with my Ed25519 key, and while I have taken a vacation from Debian, I continue to extend the expiration period on the old RSA3744-key in case I will ever have to use it again, so the overall OpenPGP setup was still sub-optimal. Having two valid OpenPGP keys at the same time causes people to use both for email encryption (leading me to have to use both devices), and the WKD Key Discovery protocol doesn t like two valid keys either. At FOSDEM 23 I ran into Andre Heinecke at GnuPG and I couldn t help complain about how complex and unsatisfying all OpenPGP-related matters were, and he mildly ignored my rant and asked why I didn t put the master key on another smartcard. The comment sunk in when I came home, and recently I connected all the dots and this post is a summary of what I did to move my offline OpenPGP master key to a Nitrokey Start.
First a word about device choice, I still prefer to use hardware devices that are as compatible with free software as possible, but the FST-01G or FST-01SZ are no longer easily available for purchase. I got a comment about Nitrokey start in my last post, and had two of them available to experiment with. There are things to dislike with the Nitrokey Start compared to the YubiKey (e.g., relative insecure chip architecture, the bulkier form factor and lack of FIDO/U2F/OATH support) but as far as I know there is no more widely available owner-controlled device that is manufactured for an intended purpose of implementing an OpenPGP card. Thus it hits the sweet spot for me.
The first step is to run latest firmware on the Nitrokey Start for bug-fixes and important OpenSSH 9.0 compatibility and there are reproducible-built firmware published that you can install using pynitrokey. I run Trisquel 11 aramo on my laptop, which does not include the Python Pip package (likely because it promotes installing non-free software) so that was a slight complication. Building the firmware locally may have worked, and I would like to do that eventually to confirm the published firmware, however to save time I settled with installing the Ubuntu 22.04 packages on my machine:
$ sha256sum python3-pip*
ded6b3867a4a4cbaff0940cab366975d6aeecc76b9f2d2efa3deceb062668b1c python3-pip_22.0.2+dfsg-1ubuntu0.2_all.deb
e1561575130c41dc3309023a345de337e84b4b04c21c74db57f599e267114325 python3-pip-whl_22.0.2+dfsg-1ubuntu0.2_all.deb
$ doas dpkg -i python3-pip*
...
$ doas apt install -f
...
$
Installing pynitrokey downloaded a bunch of dependencies, and it would be nice to audit the license and security vulnerabilities for each of them. (Verbose output below slightly redacted.)
jas@kaka:~$ pip3 install --user pynitrokey
Collecting pynitrokey
Downloading pynitrokey-0.4.34-py3-none-any.whl (572 kB)
Collecting frozendict~=2.3.4
Downloading frozendict-2.3.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (113 kB)
Requirement already satisfied: click<9,>=8.0.0 in /usr/lib/python3/dist-packages (from pynitrokey) (8.0.3)
Collecting ecdsa
Downloading ecdsa-0.18.0-py2.py3-none-any.whl (142 kB)
Collecting python-dateutil~=2.7.0
Downloading python_dateutil-2.7.5-py2.py3-none-any.whl (225 kB)
Collecting fido2<2,>=1.1.0
Downloading fido2-1.1.0-py3-none-any.whl (201 kB)
Collecting tlv8
Downloading tlv8-0.10.0.tar.gz (16 kB)
Preparing metadata (setup.py) ... done
Requirement already satisfied: certifi>=14.5.14 in /usr/lib/python3/dist-packages (from pynitrokey) (2020.6.20)
Requirement already satisfied: pyusb in /usr/lib/python3/dist-packages (from pynitrokey) (1.2.1.post1)
Collecting urllib3~=1.26.7
Downloading urllib3-1.26.15-py2.py3-none-any.whl (140 kB)
Collecting spsdk<1.8.0,>=1.7.0
Downloading spsdk-1.7.1-py3-none-any.whl (684 kB)
Collecting typing_extensions~=4.3.0
Downloading typing_extensions-4.3.0-py3-none-any.whl (25 kB)
Requirement already satisfied: cryptography<37,>=3.4.4 in /usr/lib/python3/dist-packages (from pynitrokey) (3.4.8)
Collecting intelhex
Downloading intelhex-2.3.0-py2.py3-none-any.whl (50 kB)
Collecting nkdfu
Downloading nkdfu-0.2-py3-none-any.whl (16 kB)
Requirement already satisfied: requests in /usr/lib/python3/dist-packages (from pynitrokey) (2.25.1)
Collecting tqdm
Downloading tqdm-4.65.0-py3-none-any.whl (77 kB)
Collecting nrfutil<7,>=6.1.4
Downloading nrfutil-6.1.7.tar.gz (845 kB)
Preparing metadata (setup.py) ... done
Requirement already satisfied: cffi in /usr/lib/python3/dist-packages (from pynitrokey) (1.15.0)
Collecting crcmod
Downloading crcmod-1.7.tar.gz (89 kB)
Preparing metadata (setup.py) ... done
Collecting libusb1==1.9.3
Downloading libusb1-1.9.3-py3-none-any.whl (60 kB)
Collecting pc_ble_driver_py>=0.16.4
Downloading pc_ble_driver_py-0.17.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.9 MB)
Collecting piccata
Downloading piccata-2.0.3-py3-none-any.whl (21 kB)
Collecting protobuf<4.0.0,>=3.17.3
Downloading protobuf-3.20.3-cp310-cp310-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (1.1 MB)
Collecting pyserial
Downloading pyserial-3.5-py2.py3-none-any.whl (90 kB)
Collecting pyspinel>=1.0.0a3
Downloading pyspinel-1.0.3.tar.gz (58 kB)
Preparing metadata (setup.py) ... done
Requirement already satisfied: pyyaml in /usr/lib/python3/dist-packages (from nrfutil<7,>=6.1.4->pynitrokey) (5.4.1)
Requirement already satisfied: six>=1.5 in /usr/lib/python3/dist-packages (from python-dateutil~=2.7.0->pynitrokey) (1.16.0)
Collecting pylink-square<0.11.9,>=0.8.2
Downloading pylink_square-0.11.1-py2.py3-none-any.whl (78 kB)
Collecting jinja2<3.1,>=2.11
Downloading Jinja2-3.0.3-py3-none-any.whl (133 kB)
Collecting bincopy<17.11,>=17.10.2
Downloading bincopy-17.10.3-py3-none-any.whl (17 kB)
Collecting fastjsonschema>=2.15.1
Downloading fastjsonschema-2.16.3-py3-none-any.whl (23 kB)
Collecting astunparse<2,>=1.6
Downloading astunparse-1.6.3-py2.py3-none-any.whl (12 kB)
Collecting oscrypto~=1.2
Downloading oscrypto-1.3.0-py2.py3-none-any.whl (194 kB)
Collecting deepmerge==0.3.0
Downloading deepmerge-0.3.0-py2.py3-none-any.whl (7.6 kB)
Collecting pyocd<=0.31.0,>=0.28.3
Downloading pyocd-0.31.0-py3-none-any.whl (12.5 MB)
Collecting click-option-group<0.6,>=0.3.0
Downloading click_option_group-0.5.5-py3-none-any.whl (12 kB)
Collecting pycryptodome<4,>=3.9.3
Downloading pycryptodome-3.17-cp35-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.1 MB)
Collecting pyocd-pemicro<1.2.0,>=1.1.1
Downloading pyocd_pemicro-1.1.5-py3-none-any.whl (9.0 kB)
Requirement already satisfied: colorama<1,>=0.4.4 in /usr/lib/python3/dist-packages (from spsdk<1.8.0,>=1.7.0->pynitrokey) (0.4.4)
Collecting commentjson<1,>=0.9
Downloading commentjson-0.9.0.tar.gz (8.7 kB)
Preparing metadata (setup.py) ... done
Requirement already satisfied: asn1crypto<2,>=1.2 in /usr/lib/python3/dist-packages (from spsdk<1.8.0,>=1.7.0->pynitrokey) (1.4.0)
Collecting pypemicro<0.2.0,>=0.1.9
Downloading pypemicro-0.1.11-py3-none-any.whl (5.7 MB)
Collecting libusbsio>=2.1.11
Downloading libusbsio-2.1.11-py3-none-any.whl (247 kB)
Collecting sly==0.4
Downloading sly-0.4.tar.gz (60 kB)
Preparing metadata (setup.py) ... done
Collecting ruamel.yaml<0.18.0,>=0.17
Downloading ruamel.yaml-0.17.21-py3-none-any.whl (109 kB)
Collecting cmsis-pack-manager<0.3.0
Downloading cmsis_pack_manager-0.2.10-py2.py3-none-manylinux1_x86_64.whl (25.1 MB)
Collecting click-command-tree==1.1.0
Downloading click_command_tree-1.1.0-py3-none-any.whl (3.6 kB)
Requirement already satisfied: bitstring<3.2,>=3.1 in /usr/lib/python3/dist-packages (from spsdk<1.8.0,>=1.7.0->pynitrokey) (3.1.7)
Collecting hexdump~=3.3
Downloading hexdump-3.3.zip (12 kB)
Preparing metadata (setup.py) ... done
Collecting fire
Downloading fire-0.5.0.tar.gz (88 kB)
Preparing metadata (setup.py) ... done
Requirement already satisfied: wheel<1.0,>=0.23.0 in /usr/lib/python3/dist-packages (from astunparse<2,>=1.6->spsdk<1.8.0,>=1.7.0->pynitrokey) (0.37.1)
Collecting humanfriendly
Downloading humanfriendly-10.0-py2.py3-none-any.whl (86 kB)
Collecting argparse-addons>=0.4.0
Downloading argparse_addons-0.12.0-py3-none-any.whl (3.3 kB)
Collecting pyelftools
Downloading pyelftools-0.29-py2.py3-none-any.whl (174 kB)
Collecting milksnake>=0.1.2
Downloading milksnake-0.1.5-py2.py3-none-any.whl (9.6 kB)
Requirement already satisfied: appdirs>=1.4 in /usr/lib/python3/dist-packages (from cmsis-pack-manager<0.3.0->spsdk<1.8.0,>=1.7.0->pynitrokey) (1.4.4)
Collecting lark-parser<0.8.0,>=0.7.1
Downloading lark-parser-0.7.8.tar.gz (276 kB)
Preparing metadata (setup.py) ... done
Requirement already satisfied: MarkupSafe>=2.0 in /usr/lib/python3/dist-packages (from jinja2<3.1,>=2.11->spsdk<1.8.0,>=1.7.0->pynitrokey) (2.0.1)
Collecting asn1crypto<2,>=1.2
Downloading asn1crypto-1.5.1-py2.py3-none-any.whl (105 kB)
Collecting wrapt
Downloading wrapt-1.15.0-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (78 kB)
Collecting future
Downloading future-0.18.3.tar.gz (840 kB)
Preparing metadata (setup.py) ... done
Collecting psutil>=5.2.2
Downloading psutil-5.9.4-cp36-abi3-manylinux_2_12_x86_64.manylinux2010_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (280 kB)
Collecting capstone<5.0,>=4.0
Downloading capstone-4.0.2-py2.py3-none-manylinux1_x86_64.whl (2.1 MB)
Collecting naturalsort<2.0,>=1.5
Downloading naturalsort-1.5.1.tar.gz (7.4 kB)
Preparing metadata (setup.py) ... done
Collecting prettytable<3.0,>=2.0
Downloading prettytable-2.5.0-py3-none-any.whl (24 kB)
Collecting intervaltree<4.0,>=3.0.2
Downloading intervaltree-3.1.0.tar.gz (32 kB)
Preparing metadata (setup.py) ... done
Collecting ruamel.yaml.clib>=0.2.6
Downloading ruamel.yaml.clib-0.2.7-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl (485 kB)
Collecting termcolor
Downloading termcolor-2.2.0-py3-none-any.whl (6.6 kB)
Collecting sortedcontainers<3.0,>=2.0
Downloading sortedcontainers-2.4.0-py2.py3-none-any.whl (29 kB)
Requirement already satisfied: wcwidth in /usr/lib/python3/dist-packages (from prettytable<3.0,>=2.0->pyocd<=0.31.0,>=0.28.3->spsdk<1.8.0,>=1.7.0->pynitrokey) (0.2.5)
Building wheels for collected packages: nrfutil, crcmod, sly, tlv8, commentjson, hexdump, pyspinel, fire, intervaltree, lark-parser, naturalsort, future
Building wheel for nrfutil (setup.py) ... done
Created wheel for nrfutil: filename=nrfutil-6.1.7-py3-none-any.whl size=898520 sha256=de6f8803f51d6c26d24dc7df6292064a468ff3f389d73370433fde5582b84a10
Stored in directory: /home/jas/.cache/pip/wheels/39/2b/9b/98ab2dd716da746290e6728bdb557b14c1c9a54cb9ed86e13b
Building wheel for crcmod (setup.py) ... done
Created wheel for crcmod: filename=crcmod-1.7-cp310-cp310-linux_x86_64.whl size=31422 sha256=5149ac56fcbfa0606760eef5220fcedc66be560adf68cf38c604af3ad0e4a8b0
Stored in directory: /home/jas/.cache/pip/wheels/85/4c/07/72215c529bd59d67e3dac29711d7aba1b692f543c808ba9e86
Building wheel for sly (setup.py) ... done
Created wheel for sly: filename=sly-0.4-py3-none-any.whl size=27352 sha256=f614e413918de45c73d1e9a8dca61ca07dc760d9740553400efc234c891f7fde
Stored in directory: /home/jas/.cache/pip/wheels/a2/23/4a/6a84282a0d2c29f003012dc565b3126e427972e8b8157ea51f
Building wheel for tlv8 (setup.py) ... done
Created wheel for tlv8: filename=tlv8-0.10.0-py3-none-any.whl size=11266 sha256=3ec8b3c45977a3addbc66b7b99e1d81b146607c3a269502b9b5651900a0e2d08
Stored in directory: /home/jas/.cache/pip/wheels/e9/35/86/66a473cc2abb0c7f21ed39c30a3b2219b16bd2cdb4b33cfc2c
Building wheel for commentjson (setup.py) ... done
Created wheel for commentjson: filename=commentjson-0.9.0-py3-none-any.whl size=12092 sha256=28b6413132d6d7798a18cf8c76885dc69f676ea763ffcb08775a3c2c43444f4a
Stored in directory: /home/jas/.cache/pip/wheels/7d/90/23/6358a234ca5b4ec0866d447079b97fedf9883387d1d7d074e5
Building wheel for hexdump (setup.py) ... done
Created wheel for hexdump: filename=hexdump-3.3-py3-none-any.whl size=8913 sha256=79dfadd42edbc9acaeac1987464f2df4053784fff18b96408c1309b74fd09f50
Stored in directory: /home/jas/.cache/pip/wheels/26/28/f7/f47d7ecd9ae44c4457e72c8bb617ef18ab332ee2b2a1047e87
Building wheel for pyspinel (setup.py) ... done
Created wheel for pyspinel: filename=pyspinel-1.0.3-py3-none-any.whl size=65033 sha256=01dc27f81f28b4830a0cf2336dc737ef309a1287fcf33f57a8a4c5bed3b5f0a6
Stored in directory: /home/jas/.cache/pip/wheels/95/ec/4b/6e3e2ee18e7292d26a65659f75d07411a6e69158bb05507590
Building wheel for fire (setup.py) ... done
Created wheel for fire: filename=fire-0.5.0-py2.py3-none-any.whl size=116951 sha256=3d288585478c91a6914629eb739ea789828eb2d0267febc7c5390cb24ba153e8
Stored in directory: /home/jas/.cache/pip/wheels/90/d4/f7/9404e5db0116bd4d43e5666eaa3e70ab53723e1e3ea40c9a95
Building wheel for intervaltree (setup.py) ... done
Created wheel for intervaltree: filename=intervaltree-3.1.0-py2.py3-none-any.whl size=26119 sha256=5ff1def22ba883af25c90d90ef7c6518496fcd47dd2cbc53a57ec04cd60dc21d
Stored in directory: /home/jas/.cache/pip/wheels/fa/80/8c/43488a924a046b733b64de3fac99252674c892a4c3801c0a61
Building wheel for lark-parser (setup.py) ... done
Created wheel for lark-parser: filename=lark_parser-0.7.8-py2.py3-none-any.whl size=62527 sha256=3d2ec1d0f926fc2688d40777f7ef93c9986f874169132b1af590b6afc038f4be
Stored in directory: /home/jas/.cache/pip/wheels/29/30/94/33e8b58318aa05cb1842b365843036e0280af5983abb966b83
Building wheel for naturalsort (setup.py) ... done
Created wheel for naturalsort: filename=naturalsort-1.5.1-py3-none-any.whl size=7526 sha256=bdecac4a49f2416924548cae6c124c85d5333e9e61c563232678ed182969d453
Stored in directory: /home/jas/.cache/pip/wheels/a6/8e/c9/98cfa614fff2979b457fa2d9ad45ec85fa417e7e3e2e43be51
Building wheel for future (setup.py) ... done
Created wheel for future: filename=future-0.18.3-py3-none-any.whl size=492037 sha256=57a01e68feca2b5563f5f624141267f399082d2f05f55886f71b5d6e6cf2b02c
Stored in directory: /home/jas/.cache/pip/wheels/5e/a9/47/f118e66afd12240e4662752cc22cefae5d97275623aa8ef57d
Successfully built nrfutil crcmod sly tlv8 commentjson hexdump pyspinel fire intervaltree lark-parser naturalsort future
Installing collected packages: tlv8, sortedcontainers, sly, pyserial, pyelftools, piccata, naturalsort, libusb1, lark-parser, intelhex, hexdump, fastjsonschema, crcmod, asn1crypto, wrapt, urllib3, typing_extensions, tqdm, termcolor, ruamel.yaml.clib, python-dateutil, pyspinel, pypemicro, pycryptodome, psutil, protobuf, prettytable, oscrypto, milksnake, libusbsio, jinja2, intervaltree, humanfriendly, future, frozendict, fido2, ecdsa, deepmerge, commentjson, click-option-group, click-command-tree, capstone, astunparse, argparse-addons, ruamel.yaml, pyocd-pemicro, pylink-square, pc_ble_driver_py, fire, cmsis-pack-manager, bincopy, pyocd, nrfutil, nkdfu, spsdk, pynitrokey
WARNING: The script nitropy is installed in '/home/jas/.local/bin' which is not on PATH.
Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
Successfully installed argparse-addons-0.12.0 asn1crypto-1.5.1 astunparse-1.6.3 bincopy-17.10.3 capstone-4.0.2 click-command-tree-1.1.0 click-option-group-0.5.5 cmsis-pack-manager-0.2.10 commentjson-0.9.0 crcmod-1.7 deepmerge-0.3.0 ecdsa-0.18.0 fastjsonschema-2.16.3 fido2-1.1.0 fire-0.5.0 frozendict-2.3.5 future-0.18.3 hexdump-3.3 humanfriendly-10.0 intelhex-2.3.0 intervaltree-3.1.0 jinja2-3.0.3 lark-parser-0.7.8 libusb1-1.9.3 libusbsio-2.1.11 milksnake-0.1.5 naturalsort-1.5.1 nkdfu-0.2 nrfutil-6.1.7 oscrypto-1.3.0 pc_ble_driver_py-0.17.0 piccata-2.0.3 prettytable-2.5.0 protobuf-3.20.3 psutil-5.9.4 pycryptodome-3.17 pyelftools-0.29 pylink-square-0.11.1 pynitrokey-0.4.34 pyocd-0.31.0 pyocd-pemicro-1.1.5 pypemicro-0.1.11 pyserial-3.5 pyspinel-1.0.3 python-dateutil-2.7.5 ruamel.yaml-0.17.21 ruamel.yaml.clib-0.2.7 sly-0.4 sortedcontainers-2.4.0 spsdk-1.7.1 termcolor-2.2.0 tlv8-0.10.0 tqdm-4.65.0 typing_extensions-4.3.0 urllib3-1.26.15 wrapt-1.15.0
jas@kaka:~$
Then upgrading the device worked remarkable well, although I wish that the tool would have printed URLs and checksums for the firmware files to allow easy confirmation.
jas@kaka:~$ PATH=$PATH:/home/jas/.local/bin
jas@kaka:~$ nitropy start list
Command line tool to interact with Nitrokey devices 0.4.34
:: 'Nitrokey Start' keys:
FSIJ-1.2.15-5D271572: Nitrokey Nitrokey Start (RTM.12.1-RC2-modified)
jas@kaka:~$ nitropy start update
Command line tool to interact with Nitrokey devices 0.4.34
Nitrokey Start firmware update tool
Platform: Linux-5.15.0-67-generic-x86_64-with-glibc2.35
System: Linux, is_linux: True
Python: 3.10.6
Saving run log to: /tmp/nitropy.log.gc5753a8
Admin PIN:
Firmware data to be used:
- FirmwareType.REGNUAL: 4408, hash: ...b'72a30389' valid (from ...built/RTM.13/regnual.bin)
- FirmwareType.GNUK: 129024, hash: ...b'25a4289b' valid (from ...prebuilt/RTM.13/gnuk.bin)
Currently connected device strings:
Device:
Vendor: Nitrokey
Product: Nitrokey Start
Serial: FSIJ-1.2.15-5D271572
Revision: RTM.12.1-RC2-modified
Config: *:*:8e82
Sys: 3.0
Board: NITROKEY-START-G
initial device strings: [ 'name': '', 'Vendor': 'Nitrokey', 'Product': 'Nitrokey Start', 'Serial': 'FSIJ-1.2.15-5D271572', 'Revision': 'RTM.12.1-RC2-modified', 'Config': '*:*:8e82', 'Sys': '3.0', 'Board': 'NITROKEY-START-G' ]
Please note:
- Latest firmware available is:
RTM.13 (published: 2022-12-08T10:59:11Z)
- provided firmware: None
- all data will be removed from the device!
- do not interrupt update process - the device may not run properly!
- the process should not take more than 1 minute
Do you want to continue? [yes/no]: yes
...
Starting bootloader upload procedure
Device: Nitrokey Start FSIJ-1.2.15-5D271572
Connected to the device
Running update!
Do NOT remove the device from the USB slot, until further notice
Downloading flash upgrade program...
Executing flash upgrade...
Waiting for device to appear:
Wait 20 seconds.....
Downloading the program
Protecting device
Finish flashing
Resetting device
Update procedure finished. Device could be removed from USB slot.
Currently connected device strings (after upgrade):
Device:
Vendor: Nitrokey
Product: Nitrokey Start
Serial: FSIJ-1.2.19-5D271572
Revision: RTM.13
Config: *:*:8e82
Sys: 3.0
Board: NITROKEY-START-G
device can now be safely removed from the USB slot
final device strings: [ 'name': '', 'Vendor': 'Nitrokey', 'Product': 'Nitrokey Start', 'Serial': 'FSIJ-1.2.19-5D271572', 'Revision': 'RTM.13', 'Config': '*:*:8e82', 'Sys': '3.0', 'Board': 'NITROKEY-START-G' ]
finishing session 2023-03-16 21:49:07.371291
Log saved to: /tmp/nitropy.log.gc5753a8
jas@kaka:~$
jas@kaka:~$ nitropy start list
Command line tool to interact with Nitrokey devices 0.4.34
:: 'Nitrokey Start' keys:
FSIJ-1.2.19-5D271572: Nitrokey Nitrokey Start (RTM.13)
jas@kaka:~$
Before importing the master key to this device, it should be configured. Note the commands in the beginning to make sure scdaemon/pcscd is not running because they may have cached state from earlier cards. Change PIN code as you like after this, my experience with Gnuk was that the Admin PIN had to be changed first, then you import the key, and then you change the PIN.
jas@kaka:~$ gpg-connect-agent "SCD KILLSCD" "SCD BYE" /bye
OK
ERR 67125247 Slut p fil <GPG Agent>
jas@kaka:~$ ps auxww grep -e pcsc -e scd
jas 11651 0.0 0.0 3468 1672 pts/0 R+ 21:54 0:00 grep --color=auto -e pcsc -e scd
jas@kaka:~$ gpg --card-edit
Reader ...........: 20A0:4211:FSIJ-1.2.19-5D271572:0
Application ID ...: D276000124010200FFFE5D2715720000
Application type .: OpenPGP
Version ..........: 2.0
Manufacturer .....: unmanaged S/N range
Serial number ....: 5D271572
Name of cardholder: [not set]
Language prefs ...: [not set]
Salutation .......:
URL of public key : [not set]
Login data .......: [not set]
Signature PIN ....: forced
Key attributes ...: rsa2048 rsa2048 rsa2048
Max. PIN lengths .: 127 127 127
PIN retry counter : 3 3 3
Signature counter : 0
KDF setting ......: off
Signature key ....: [none]
Encryption key....: [none]
Authentication key: [none]
General key info..: [none]
gpg/card> admin
Admin commands are allowed
gpg/card> kdf-setup
gpg/card> passwd
gpg: OpenPGP card no. D276000124010200FFFE5D2715720000 detected
1 - change PIN
2 - unblock PIN
3 - change Admin PIN
4 - set the Reset Code
Q - quit
Your selection? 3
PIN changed.
1 - change PIN
2 - unblock PIN
3 - change Admin PIN
4 - set the Reset Code
Q - quit
Your selection? q
gpg/card> name
Cardholder's surname: Josefsson
Cardholder's given name: Simon
gpg/card> lang
Language preferences: sv
gpg/card> sex
Salutation (M = Mr., F = Ms., or space): m
gpg/card> login
Login data (account name): jas
gpg/card> url
URL to retrieve public key: https://josefsson.org/key-20190320.txt
gpg/card> forcesig
gpg/card> key-attr
Changing card key attribute for: Signature key
Please select what kind of key you want:
(1) RSA
(2) ECC
Your selection? 2
Please select which elliptic curve you want:
(1) Curve 25519
(4) NIST P-384
Your selection? 1
The card will now be re-configured to generate a key of type: ed25519
Note: There is no guarantee that the card supports the requested size.
If the key generation does not succeed, please check the
documentation of your card to see what sizes are allowed.
Changing card key attribute for: Encryption key
Please select what kind of key you want:
(1) RSA
(2) ECC
Your selection? 2
Please select which elliptic curve you want:
(1) Curve 25519
(4) NIST P-384
Your selection? 1
The card will now be re-configured to generate a key of type: cv25519
Changing card key attribute for: Authentication key
Please select what kind of key you want:
(1) RSA
(2) ECC
Your selection? 2
Please select which elliptic curve you want:
(1) Curve 25519
(4) NIST P-384
Your selection? 1
The card will now be re-configured to generate a key of type: ed25519
gpg/card>
jas@kaka:~$ gpg --card-edit
Reader ...........: 20A0:4211:FSIJ-1.2.19-5D271572:0
Application ID ...: D276000124010200FFFE5D2715720000
Application type .: OpenPGP
Version ..........: 2.0
Manufacturer .....: unmanaged S/N range
Serial number ....: 5D271572
Name of cardholder: Simon Josefsson
Language prefs ...: sv
Salutation .......: Mr.
URL of public key : https://josefsson.org/key-20190320.txt
Login data .......: jas
Signature PIN ....: not forced
Key attributes ...: ed25519 cv25519 ed25519
Max. PIN lengths .: 127 127 127
PIN retry counter : 3 3 3
Signature counter : 0
KDF setting ......: on
Signature key ....: [none]
Encryption key....: [none]
Authentication key: [none]
General key info..: [none]
jas@kaka:~$
Once setup, bring out your offline machine and boot it and mount your USB stick with the offline key. The paths below will be different, and this is using a somewhat unorthodox approach of working with fresh GnuPG configuration paths that I chose for the USB stick.
jas@kaka:/media/jas/2c699cbd-b77e-4434-a0d6-0c4965864296$ cp -a gnupghome-backup-masterkey gnupghome-import-nitrokey-5D271572
jas@kaka:/media/jas/2c699cbd-b77e-4434-a0d6-0c4965864296$ gpg --homedir $PWD/gnupghome-import-nitrokey-5D271572 --edit-key B1D2BD1375BECB784CF4F8C4D73CF638C53C06BE
gpg (GnuPG) 2.2.27; Copyright (C) 2021 Free Software Foundation, Inc.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Secret key is available.
sec ed25519/D73CF638C53C06BE
created: 2019-03-20 expired: 2019-10-22 usage: SC
trust: ultimate validity: expired
[ expired] (1). Simon Josefsson <simon@josefsson.org>
gpg> keytocard
Really move the primary key? (y/N) y
Please select where to store the key:
(1) Signature key
(3) Authentication key
Your selection? 1
sec ed25519/D73CF638C53C06BE
created: 2019-03-20 expired: 2019-10-22 usage: SC
trust: ultimate validity: expired
[ expired] (1). Simon Josefsson <simon@josefsson.org>
gpg>
Save changes? (y/N) y
jas@kaka:/media/jas/2c699cbd-b77e-4434-a0d6-0c4965864296$
At this point it is useful to confirm that the Nitrokey has the master key available and that is possible to sign statements with it, back on your regular machine:
jas@kaka:~$ gpg --card-status
Reader ...........: 20A0:4211:FSIJ-1.2.19-5D271572:0
Application ID ...: D276000124010200FFFE5D2715720000
Application type .: OpenPGP
Version ..........: 2.0
Manufacturer .....: unmanaged S/N range
Serial number ....: 5D271572
Name of cardholder: Simon Josefsson
Language prefs ...: sv
Salutation .......: Mr.
URL of public key : https://josefsson.org/key-20190320.txt
Login data .......: jas
Signature PIN ....: not forced
Key attributes ...: ed25519 cv25519 ed25519
Max. PIN lengths .: 127 127 127
PIN retry counter : 3 3 3
Signature counter : 1
KDF setting ......: on
Signature key ....: B1D2 BD13 75BE CB78 4CF4 F8C4 D73C F638 C53C 06BE
created ....: 2019-03-20 23:37:24
Encryption key....: [none]
Authentication key: [none]
General key info..: pub ed25519/D73CF638C53C06BE 2019-03-20 Simon Josefsson <simon@josefsson.org>
sec> ed25519/D73CF638C53C06BE created: 2019-03-20 expires: 2023-09-19
card-no: FFFE 5D271572
ssb> ed25519/80260EE8A9B92B2B created: 2019-03-20 expires: 2023-09-19
card-no: FFFE 42315277
ssb> ed25519/51722B08FE4745A2 created: 2019-03-20 expires: 2023-09-19
card-no: FFFE 42315277
ssb> cv25519/02923D7EE76EBD60 created: 2019-03-20 expires: 2023-09-19
card-no: FFFE 42315277
jas@kaka:~$ echo foo gpg -a --sign gpg --verify
gpg: Signature made Thu Mar 16 22:11:02 2023 CET
gpg: using EDDSA key B1D2BD1375BECB784CF4F8C4D73CF638C53C06BE
gpg: Good signature from "Simon Josefsson <simon@josefsson.org>" [ultimate]
jas@kaka:~$
Finally to retrieve and sign a key, for example Andre Heinecke s that I could confirm the OpenPGP key identifier from his business card.
jas@kaka:~$ gpg --locate-external-keys aheinecke@gnupg.com
gpg: key 1FDF723CF462B6B1: public key "Andre Heinecke <aheinecke@gnupg.com>" imported
gpg: Total number processed: 1
gpg: imported: 1
gpg: marginals needed: 3 completes needed: 1 trust model: pgp
gpg: depth: 0 valid: 2 signed: 7 trust: 0-, 0q, 0n, 0m, 0f, 2u
gpg: depth: 1 valid: 7 signed: 64 trust: 7-, 0q, 0n, 0m, 0f, 0u
gpg: next trustdb check due at 2023-05-26
pub rsa3072 2015-12-08 [SC] [expires: 2025-12-05]
94A5C9A03C2FE5CA3B095D8E1FDF723CF462B6B1
uid [ unknown] Andre Heinecke <aheinecke@gnupg.com>
sub ed25519 2017-02-13 [S]
sub ed25519 2017-02-13 [A]
sub rsa3072 2015-12-08 [E] [expires: 2025-12-05]
sub rsa3072 2015-12-08 [A] [expires: 2025-12-05]
jas@kaka:~$ gpg --edit-key "94A5C9A03C2FE5CA3B095D8E1FDF723CF462B6B1"
gpg (GnuPG) 2.2.27; Copyright (C) 2021 Free Software Foundation, Inc.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
pub rsa3072/1FDF723CF462B6B1
created: 2015-12-08 expires: 2025-12-05 usage: SC
trust: unknown validity: unknown
sub ed25519/2978E9D40CBABA5C
created: 2017-02-13 expires: never usage: S
sub ed25519/DC74D901C8E2DD47
created: 2017-02-13 expires: never usage: A
The following key was revoked on 2017-02-23 by RSA key 1FDF723CF462B6B1 Andre Heinecke <aheinecke@gnupg.com>
sub cv25519/1FFE3151683260AB
created: 2017-02-13 revoked: 2017-02-23 usage: E
sub rsa3072/8CC999BDAA45C71F
created: 2015-12-08 expires: 2025-12-05 usage: E
sub rsa3072/6304A4B539CE444A
created: 2015-12-08 expires: 2025-12-05 usage: A
[ unknown] (1). Andre Heinecke <aheinecke@gnupg.com>
gpg> sign
pub rsa3072/1FDF723CF462B6B1
created: 2015-12-08 expires: 2025-12-05 usage: SC
trust: unknown validity: unknown
Primary key fingerprint: 94A5 C9A0 3C2F E5CA 3B09 5D8E 1FDF 723C F462 B6B1
Andre Heinecke <aheinecke@gnupg.com>
This key is due to expire on 2025-12-05.
Are you sure that you want to sign this key with your
key "Simon Josefsson <simon@josefsson.org>" (D73CF638C53C06BE)
Really sign? (y/N) y
gpg> quit
Save changes? (y/N) y
jas@kaka:~$
This is on my day-to-day machine, using the NitroKey Start with the offline key. No need to boot the old offline machine just to sign keys or extend expiry anymore! At FOSDEM 23 I managed to get at least one DD signature on my new key, and the Debian keyring maintainers accepted my Ed25519 key. Hopefully I can now finally let my 2014-era RSA3744 key expire in 2023-09-19 and not extend it any further. This should finish my transition to a simpler OpenPGP key setup, yay!
Today I stumbled upon
this youtube video
which takes a retrocomputing look at a product I was involved in
creating in 1999. It was fascinating looking back at it, and I realized
I've never written down how this boxed set of Debian "slink and a half",
an unofficial Debian release, came to be.
As best I can remember, the CD in that box was Debian 2.1 ("slink") with
the linux kernel updated from 2.0 to 2.2. Specifically, it used VA Linux
Systems's patched version of the kernel, which supported their hardware
better, but also 2.2 generally supported a lot of hardware much better than
2.0. There were some other small modifications that got rolled back into
Debian 2.2.
I mostly remember updating the installer to support that kernel, and
building CD images. Probably over the course of a few weeks. This was the
first time I worked on the (old) Debian installer, and the first time I
built a Debian CD. I also edited the O'Rielly book that was
included in the boxed set.
It was wild when pallet loads of these boxed sets showed up. I think they
sold for $19.95 at Fry's, although VA Linux Systems also gave lots of them
away at conferences.
Watching the video of the installation, I was struck again and again by
pain points, which the video does a good job of highlighting. It was a
guided tour of everything about Debian that I wanted to fix in 1999. At
each pain point I remembered how we fixed it, often years later, after
considerable effort.
I remembered how the old installer (the boot-floppies) was mostly moribund
with only a couple people able and willing to work on it at all. (The video
is right to compare its partitioning with old Linux installers from the
early 90's because it was a relic from that era!) I remembered designing a
new Debian installer that was more modular so more people could get
invested in maintaining smaller pieces of it. It was yes, a second system,
and developed too slowly, but was intended to withstand the test of time.
It mostly has, since it's used to this day.
I remembered how partitioning got automated in new Debian installer,
by a new "partman" program being contributed by someone I'd never heard of
before, obsoleting some previous attempts we'd made (yay modularity).
I remembered how I started the os-prober project, which lets the Debian
installer add other OS's that are co-installed on the machine to the boot
menu. And how that got picked up even outside of Debian, by eg Red Hat.
I remembered working on tasksel soon after that project was started, and
all the difficult decisions about what tasks to offer and what software it
should install.
I remembered how the horrible stream of questions from package after
package was to deal with, and how I implemented debconf, which tidied that
up, integrated it into the installer's UI, made it automatable, and let
novices avoid seeing configuration that was intended for experts. And I
remembered writing dpkg-reconfigure, so that those configuration choices
could be revisited later.
It's quite possible I would not have done most of that if VA Linux Systems
had not tasked me with making this CD. The thing about releasing something
imperfect into the world is you start to feel a responsibility to improve
it...
The main critique in the video specific to this boxed set and not to any
other Debian release of this era is that this was a single CD, while 2 CDs
were needed for all of Debian at the time. And many people had only dialup
internet, so would be stuck very slowly downloading any other software they
needed. And likewise those free forever upgrades the box promised.
Oh the irony: After starting many of those projects, I left VA Linux
Systems and the lands of fast internet, and spent 4 years on dialup. Most
of that stuff was developed on dialup, though I did have about a year with
better internet at the end to put the finishing touches in the new
installer that shipped in Debian 3.1.
Yes, the dialup apt-gets were excruciatingly slow.
But the upgrades were in fact, free forever.
PS: The video's description includes
"it would take many years of effort (primarily from Ubuntu)
that would help smooth out many of the rough end of this product".
All these years later, I do continue to enjoy people involved in
Ubuntu downplaying the extent that it was a reskin of my Debian
installer shipped on a CD a few months before Debian could get around to
shipping it. Like they say, history doesn't repeat, but it does rhyme.
PPS: While researching this blog post, I found an even more obscure,
and broken, Debian CD was produced by VA Linux in November 1999.
Distributed for free at Comdex by the thousands, this CD lacked
the Packages file that is necessary for apt-get to use it.
I don't know if any versions of that CD still exist. If you find one, email
me and I'll send some instructions I wrote up in 1999 to work around the
problem.
I won t waste your time with introductions. The title says it all so let s jump right in. I ll give you as many links as possible so that this article stays as short as possible.
So first, what is Salsa? Salsa is a name of a GitLab instance that is used by Debian teams to manage Debian packages and also collaborate on Development. If you have used GitLab before, the Salsa platform is not any different. To have a feel of it, it is available at https://salsa.debian.org. Still, want to know more? Find more information in the wiki. Intrigued to a point of getting started? Setup up your account by following this information
Secondly, what is Salsa CI? Like many large projects with different contributors and strict maintenance, Debian is no different. This Linux distribution is made up of many packages which need to follow a certain standard and structure or purpose of compatibility, scalability and maintainability. The Salsa CI is a continuous integration tool that does just that. I hope that is precise and satisfying .
I would have ended here but since our focus is Salsa CI tool, let me get a little deeper and wider. You could also make great use of your time when I provide more information. The Salsa CI was developed to continuously check for the health of Debian packages before they can be uploaded to the archive by running a series of CI/CD jobs. The jobs are run against setup images that are already uploaded and updated regularly to reduce build time.
The use of Salsa CI is becoming prominent ever since its inception. The Salsa CI pipeline has become popular (used by ~8k projects, from MariaDB to the Linux kernel packaging), and it is even the base for more complex CI pipelines used by other Linux flavours. The issue is the more popular it becomes, the more efficient it has to get and the more need to make the build time as shorter as possible. This happens by iterating and testing out different tools during different stages of the pipeline to find the best industrial tool. This is one of the priorities for anyone who develops for or maintains Salsa CI.
So that is how deep I can go for now.
But wait, what if you what to contribute?
If you have working knowledge in bash, git, CI, python and knowledge in building Debian packages it could be easy for you to figure out where components are and how they interact with each other. What if you don t have the knowledge? Then that is where the fun comes in.
Getting started on making a meaningful contribution to Salsa CI will need more passion and discipline, the expertise comes later and slowly. I have contributed to Salsa CI even without high-level expertise and knowledge in some of the tools. When I started contributing to Salsa CI what a Debian package is, I even didn t know that the tool that I am trying to navigate is being used by prominent software teams. But it is the challenge that I set for myself that as of now, enabled me to be able to work on a crucial part of the whole Continous integration. Wanna know what it is?
I am, as at the time of writing this article, integrating sbuild into Salsa CI to replace it with dpkg-buildpackage. This in turn will help to reduce the build time by getting rid of some jobs hence making the CI work faster. Cool, right?
Contributing to such a significant project can be a little challenging at the start but when you realize how important the piece you are working on is, you suddenly fall in love with it and want to follow through so that you can also be part of the large community that helps to make this world a better place in obscure ways.
So why don t you check out some of the Salsa CI open issues and see if you d be interested in improving it?
This is the second part of how I build a read-only root setup for my router. You might want to read part 1 first, which covers the initial boot and general overview of how I tie the pieces together. This post will describe how I build the squashfs image that forms the main filesystem.
Most of the build is driven from a script, make-router, which I ll dissect below. It s highly tailored to my needs, and this is a fairly lengthy post, but hopefully the steps I describe prove useful to anyone trying to do something similar.
Breakdown of make-router
#!/bin/bash# Either rb3011 (arm) or rb5009 (arm64)#HOSTNAME="rb3011"HOSTNAME="rb5009"if["x$ HOSTNAME"=="xrb3011"];then
ARCH=armhf
elif["x$ HOSTNAME"=="xrb5009"];then
ARCH=arm64
else
echo"Unknown host: $ HOSTNAME"exit 1
fi
It s a bash script, and I allow building for either my RB3011 or RB5009, which means a different architecture (32 vs 64 bit). I run this script on my Pi 4 which means I don t have to mess about with QemuUserEmulation.
BASE_DIR=$(dirname$0)IMAGE_FILE=$(mktemp--tmpdir router.$ ARCH.XXXXXXXXXX.img)MOUNT_POINT=$(mktemp-p /mnt -d router.$ ARCH.XXXXXXXXXX)# Build and mount an ext4 image file to put the root file system indd if=/dev/zero bs=1 count=0 seek=1G of=$ IMAGE_FILE
mkfs -t ext4 $ IMAGE_FILE
mount -o loop $ IMAGE_FILE$ MOUNT_POINT
I build the image in a loopback ext4 file on tmpfs (my Pi4 is the 8G model), which makes things a bit faster.
# Add dpkg excludesmkdir-p$ MOUNT_POINT/etc/dpkg/dpkg.cfg.d/
cat<<EOF > $ MOUNT_POINT/etc/dpkg/dpkg.cfg.d/path-excludes
# Exclude docs
path-exclude=/usr/share/doc/*
# Only locale we want is English
path-exclude=/usr/share/locale/*
path-include=/usr/share/locale/en*/*
path-include=/usr/share/locale/locale.alias
# No man pages
path-exclude=/usr/share/man/*
EOF
Create a dpkg excludes config to drop docs, man pages and most locales before we even start the bootstrap.
Actually do the debootstrap step, including a bunch of extra packages that we want.
# Install mqtt-arpcp$ BASE_DIR/debs/mqtt-arp_1_$ ARCH.deb $ MOUNT_POINT/tmp
chroot$ MOUNT_POINT dpkg -i /tmp/mqtt-arp_1_$ ARCH.deb
rm$ MOUNT_POINT/tmp/mqtt-arp_1_$ ARCH.deb
# Frob the mqtt-arp config so it starts after mosquittosed-i-e's/After=.*/After=mosquitto.service/'$ MOUNT_POINT/lib/systemd/system/mqtt-arp.service
I haven t uploaded mqtt-arp to Debian, so I install a locally built package, and ensure it starts after mosquitto (the MQTT broker), given they re running on the same host.
# Frob watchdog so it starts earlier than multi-usersed-i-e's/After=.*/After=basic.target/'$ MOUNT_POINT/lib/systemd/system/watchdog.service
# Make sure the watchdog is poking the device filesed-i-e's/^#watchdog-device/watchdog-device/'$ MOUNT_POINT/etc/watchdog.conf
watchdog timeouts were particularly an issue on the RB3011, where the default timeout didn t give enough time to reach multiuser mode before it would reset the router. Not helpful, so alter the config to start it earlier (and make sure it s configured to actually kick the device file).
# Clean up docs + localesrm-r$ MOUNT_POINT/usr/share/doc/*rm-r$ MOUNT_POINT/usr/share/man/*for dir in$ MOUNT_POINT/usr/share/locale/*/;do
if["$ dir"!="$ MOUNT_POINT/usr/share/locale/en/"];then
rm-r$ dirfi
done
Clean up any docs etc that ended up installed.
# Set root password to rootecho"root:root"chroot$ MOUNT_POINT chpasswd
The only login method is ssh key to the root account though I suppose this allows for someone to execute a privilege escalation from a daemon user so I should probably randomise this. Does need to be known though so it s possible to login via the serial console for debugging.
There are config files that are easier to replace wholesale, some of which are specific to the hardware (e.g. related to network interfaces). See below for some more details.
# Build symlinks into flash for boot / modulesln-s /mnt/flash/lib/modules $ MOUNT_POINT/lib/modules
rmdir$ MOUNT_POINT/boot
ln-s /mnt/flash/boot $ MOUNT_POINT/boot
The kernel + its modules live outside the squashfs image, on the USB flash drive that the image lives on. That makes for easier kernel upgrades.
# Put our git revision into os-releaseecho-n"GIT_VERSION=">>$ MOUNT_POINT/etc/os-release
(cd$ BASE_DIR; git describe --tags)>>$ MOUNT_POINT/etc/os-release
Always helpful to be able to check the image itself for what it was built from.
# Add some stuff to root's .bashrccat<<EOF >> $ MOUNT_POINT/root/.bashrc
alias ls='ls -F --color=auto'
eval "\$(dircolors)"
case "\$TERM" in
xterm* rxvt*)
PS1="\\[\\e]0;\\u@\\h: \\w\a\\]\$PS1"
;;
*)
;;
esac
EOF
Just some niceties for when I do end up logging in.
# Save the installed package list offchroot$ MOUNT_POINT dpkg --get-selections> /tmp/wip-installed-packages
Save off the installed package list. This was particularly useful when trying to replicate the existing router setup and making sure I had all the important packages installed. It doesn t really serve a purpose now.
In terms of the config files I copy into /etc, shared across both routers are the following:
Breakdown of shared config
In this post I will give a quick tutorial on how to set up fast Debian package builds using sbuild with mmdebstrap and apt-cacher-ng.
The usual tool for building Debian packages is dpkg-buildpackage, or a user-friendly wrapper like debuild, and while these are geat tools, if you want to upload something to the Debian archive they lack the required separation from the system they are run on to ensure that your packaging also works on a different system. The usual candidate here is sbuild. But setting up a schroot is tedious and performance tuning can be annoying. There is an alternative backend for sbuild that promises to make everything simpler: unshare. In this tutorial I will show you how to set up sbuild with this backend.
Additionally to the normal performance tweaking, caching downloaded packages can be a huge performance increase when rebuilding packages. I do rebuilds quite often, mostly when a new dependency got introduced I didn t specify in debian/control yet or lintian notices a something I can easily fix. So let s begin with setting up this caching.
Setting up apt-cacher-ng
Install apt-cacher-ng:
sudo apt install apt-cacher-ng
A pop-up will appear, if you are unsure how to answer it select no, we don t need it for this use-case.
To enable apt-cacher-ng on your system, create /etc/apt/apt.conf.d/02proxy and insert:
In /etc/apt-cacher-ng/acng.conf you can increase the value of ExThreshold to hold packages for a shorter or longer duration.
The length depends on your specific use case and resources. A longer threshold takes more disk space, a short threshold like one day effecitvely only reduces the build time for rebuilds.
If you encounter weird issues on apt update at some point the future, you can try to clean the cache from apt-cacher-ng.
You can use this script:
Setting up mmdebstrap
Install mmdebstrap:
sudo apt install mmdebstrap
We will create a small helper script to ease creating a chroot. Open ~/.local/bin/mmupdate and insert:
If you execute mmupdate again you can see that the downloading stage is much faster thanks to apt-cacher-ng. For me the difference is from about 115s to about 95s. Your results may vary, this depends on the speed of your internet, Debian mirror and disk.
If you have used the schroot backend and sbuild-update before, you probably notice that creating a new chroot with mmdebstrap is slower. It would be a bit annoying to do this manually before we start a new Debian packaging session, so let s create a systemd service that does this for us.
First create a folder for user services:
mkdir -p ~/.config/systemd/user
Create ~/.config/systemd/user/mmupdate.service and add:
Now every day mmupdte will be run automatically. You can adjust the period if you think daily rebuilds are a bit excessive.
A neat advantage of period rebuilds is that they the base files in your apt-cacher-ng cache warm every time they run.
Setting up sbuild:
Install sbuild and (optionally) autopkgtest:
# backend for using mmdebstrap chroots
$chroot_mode = 'unshare';
# build in tmpfs
$unshare_tmpdir_template = '/dev/shm/tmp.sbuild.XXXXXXXX';
# upgrade before starting build
$apt_update = 1;
$apt_upgrade = 1;
# build everything including source for source-only uploads
$build_arch_all = 1;
$build_arch_any = 1;
$build_source = 1;
$source_only_changes = 1;
# go to shell on failure instead of exiting
$external_commands = "build-failed-commands" => [ [ '%SBUILD_SHELL' ] ] ;
# always clean build dir, even on failure
$purge_build_directory = "always";
# run lintian
$run_lintian = 1;
$lintian_opts = [ '-i', '-I', '-E', '--pedantic' ];
# do not run piuparts
$run_piuparts = 0;
# run autopkgtest
$run_autopkgtest = 1;
$autopkgtest_root_args = '';
$autopkgtest_opts = [ '--apt-upgrade', '--', 'unshare', '--release', '%r', '--arch', '%a', '--prefix=/dev/shm/tmp.autopkgtest.' ];
# set uploader for correct signing
$uploader_name = 'Stephan Lachnit <stephanlachnit@debian.org>';
You should adjust uploader_name. If you don t want to run autopkgtest or lintian by default you can also disable it here. Note that for packages that need a lot of space for building, you might want to comment the unshare_tmpdir_template line to prevent a OOM build failure.
You can now build your Debian packages with the sbuild command :)
Finishing touches
You can add these variables to your ~/.bashrc as bonus (with adjusted name / email):
In particular adjust the value of parallel to ensure parallel builds.
If you are new to signing / uploading your package, first install the required tools:
If you don t introduce a new binary package, you always want to go with source-only changes.
You can now upload the package to Debian with
dput ../<filename>.changes
Update Feburary 22nd
Jochen Sprickerhof, who originally advised me to use the unshare backend, commented that one can also use --include=auto-apt-proxy instead of the --aptopt option in mmdebstrap to detect apt proxies automatically.
He also let me know that it is possible to use autopkgtest on tmpfs (config in the blog post is updated) and added an entry on the sbuild wiki page on how to setup sbuild+unshare with ccache if you often need to build a large package.
Further, using --variant=apt and --include=build-essential will produce smaller build chroots if wished. On the contrary, one can of course also use the --include option to include debhelper and lintian (or any other packages you like) to further decrease the setup time. However, staying with buildd variant is a good choice for official uploads.
I recently got a new NVME drive. My plan was to create a fresh Debian install on an F2FS root partition with compression for maximum performance. As it turns out, this is not entirely trivil to accomplish.
For one, the Debian installer does not support F2FS (here is my attempt to add it from 2021).
And even if it did, grub does not support F2FS with the extra_attr flag that is required for compression support (at least as of grub 2.06).
Luckily, we can install Debian anyway with all these these shiny new features when we go the manual road with debootstrap and using systemd-boot as bootloader.
We can break down the process into several steps:
Warning: Playing around with partitions can easily result in data if you mess up! Make sure to double check your commands and create a data backup if you don t feel confident about the process.
Creating the partition partble
The first step is to create the GPT partition table on the new drive. There are several tools to do this, I recommend the ArchWiki page on this topic for details.
For simplicity I just went with the GParted since it has an easy GUI, but feel free to use any other tool.
The layout should look like this:
Type Partition Suggested size
EFI /dev/nvme0n1p1 512MiB
Linux swap /dev/nvme0n1p2 1GiB
Linux fs /dev/nvme0n1p3 remainder
Notes:
The disk names are just an example and have to be adjusted for your system.
Don t set disk labels, they don t appear on the new install anyway and some UEFIs might not like it on your boot partition.
The size of the EFI partition can be smaller, in practive it s unlikely that you need more than 300 MiB. However some UEFIs might be buggy and if you ever want to install an additional kernel or something like memtest86+ you will be happy to have the extra space.
The swap partition can be omitted, it is not strictly needed. If you need more swap for some reason you can also add more using a swap file later (see ArchWiki page). If you know you want to use suspend-to-RAM, you want to increase the size to something more than the size of your memory.
If you used GParted, create the EFI partition as FAT32 and set the esp flag. For the root partition use ext4 or F2FS if available.
Creating and mounting the root partition
To create the root partition, we need to install the f2fs-tools first:
sudo apt install f2fs-tools
Now we can create the file system with the correct flags:
--arch sets the CPU architecture (see Debian Wiki).
--components sets the archive components, if you don t want non-free pacakges you might want to remove some entries here.
unstable is the Debian release, you might want to change that to testing or bookworm.
$DFS points to the mounting point of the root partition.
http://deb.debian.org/debian is the Debian mirror, you might want to set that to http://ftp.de.debian.org/debian or similar if you have a fast mirror in you area.
Chrooting into the system
Before we can chroot into the newly created system, we need to prepare and mount virtual kernel file systems. First create the directories:
Then bind-mount the directories from your system to the mount point of the new system:
sudo mount -v -B /dev $DFS/dev
sudo mount -v -B /dev/pts $DFS/dev/pts
sudo mount -v -B /proc $DFS/proc
sudo mount -v -B /sys $DFS/sys
sudo mount -v -B /run $DFS/run
sudo mount -v -B /sys/firmware/efi/efivars $DFS/sys/firmware/efi/efivars
As a last step, we need to mount the EFI partition:
sudo mount -v -B /dev/nvme0n1p1 $DFS/boot/efi
Now we can chroot into new system:
sudo chroot $DFS /bin/bash
Configure the base system
The first step in the chroot is setting the locales. We need this since we might leak the locales from our base system into the chroot and if this happens we get a lot of annoying warnings.
Now you have a fully functional Debian chroot! However, it is not bootable yet, so let s fix that.
Define static file system information
The first step is to make sure the system mounts all partitions on startup with the correct mount flags.
This is done in /etc/fstab (see ArchWiki page).
Open the file and change its content to:
# file system mount point type options dump pass
# NVME efi partition
UUID=XXXX-XXXX /boot/efi vfat umask=0077 0 0
# NVME swap
UUID=XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX none swap sw 0 0
# NVME main partition
UUID=XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX / f2fs compress_algorithm=zstd:6,compress_chksum,atgc,gc_merge,lazytime 0 1
You need to fill in the UUIDs for the partitions. You can use
ls -lAph /dev/disk/by-uuid/
to match the UUIDs to the more readable disk name under /dev.
Installing the kernel and bootloader
First install the systemd-boot and efibootmgr packages:
apt install systemd-boot efibootmgr
Now we can install the bootloader:
bootctl install --path=/boot/efi
You can verify the procedure worked with
efibootmgr -v
The next step is to install the kernel, you can find a fitting image with:
apt search linux-image-*
In my case:
apt install linux-image-amd64
After the installation of the kernel, apt will add an entry for systemd-boot automatically. Neat!
However, since we are in a chroot the current settings are not bootable.
The first reason is the boot partition, which will likely be the one from your current system.
To change that, navigate to /boot/efi/loader/entries, it should contain one config file.
When you open this file, it should look something like this:
title Debian GNU/Linux bookworm/sid
version 6.1.0-3-amd64
machine-id 2967cafb6420ce7a2b99030163e2ee6a
sort-key debian
options root=PARTUUID=f81d4fae-7dec-11d0-a765-00a0c91e6bf6 ro systemd.machine_id=2967cafb6420ce7a2b99030163e2ee6a
linux /2967cafb6420ce7a2b99030163e2ee6a/6.1.0-3-amd64/linux
initrd /2967cafb6420ce7a2b99030163e2ee6a/6.1.0-3-amd64/initrd.img-6.1.0-3-amd64
The PARTUUID needs to point to the partition equivalent to /dev/nvme0n1p3 on your system. You can use
ls -lAph /dev/disk/by-partuuid/
to match the PARTUUIDs to the more readable disk name under /dev.
The second problem is the ro flag in options which tell the kernel to boot in read-only mode.
The default is rw, so you can just remove the ro flag.
Once this is fixed, the new system should be bootable. You can change the boot order with:
efibootmgr --bootorder
However, before we reboot we might add well add a user and install some basic software.
This is the story of the currently progressing changes to secure boot
on Ubuntu and the history of how we got to where we are.
taking a step back: how does secure boot on Ubuntu work?
Booting on Ubuntu involves three components after the firmware:
shim
grub
linux
Each of these is a PE binary signed with a key. The shim is signed by Microsoft s
3rd party key and embeds a self-signed Canonical CA certificate, and optionally a
vendor dbx (a list of revoked certificates or binaries). grub and linux (and fwupd)
are then signed by a certificate issued by that CA
In Ubuntu s case, the CA certificate is sharded: Multiple people each have a part
of the key and they need to meet to be able to combine it and sign things, such as
new code signing certificates.
BootHole
When BootHole happened in 2020, travel was suspended and we hence could not rotate
to a new signing certificate. So when it came to updating our shim for the CVEs, we
had to revoke all previously signed kernels, grubs, shims, fwupds by their hashes.
This generated a very large vendor dbx which caused lots of issues as shim exported
them to a UEFI variable, and not everyone had enough space for such large variables.
Sigh.
We decided we want to rotate our signing key next time.
This was also when upstream added SBAT metadata to shim and grub. This gives
a simple versioning scheme for security updates and easy revocation using a
simple EFI variable that shim writes to and reads from.
Spring 2022 CVEs
We still were not ready for travel in 2021, but during BootHole we developed the
SBAT mechanism, so one could revoke a grub or shim by setting a single EFI variable.
We actually missed rotating the shim this cycle as a new vulnerability was reported
immediately after it, and we decided to hold on to it.
2022 key rotation and the fall CVEs
This caused some problems when the 2nd CVE round came, as we did not have a shim
with the latest SBAT level, and neither did a lot of others, so we ended up deciding
upstream to not bump the shim SBAT requirements just yet. Sigh.
Anyway, in October we were meeting again for the first time at a Canonical sprint,
and the shardholders got together and created three new signing keys: 2022v1, 2022v2,
and 2022v3. It took us until January before they were installed into the signing service
and PPAs setup to sign with them.
We also submitted a shim 15.7 with the old keys revoked which came back at around
the same time.
Now we were in a hurry. The 22.04.2 point release was scheduled for around middle
of February, and we had nothing signed with the new keys yet, but our new shim
which we need for the point release (so the point release media remains bootable
after the next round of CVEs), required new keys.
So how do we ensure that users have kernels, grubs, and fwupd signed with the
new key before we install the new shim?
upgrade ordering
grub and fwupd are simple cases: For grub, we depend on the new version. We decided
to backport grub 2.06 to all releases (which moved focal and bionic up from 2.04), and
kept the versioning of the -signed packages the same across all releases, so we were
able to simply bump the Depends for grub to specify the new minimum version. For fwupd-efi,
we added Breaks.
(Actually, we also had a backport of the CVEs for 2.04 based grub, and we did publish that
for 20.04 signed with the old keys before backporting 2.06 to it.)
Kernels are a different story: There are about 60 kernels out there. My initial idea was
that we could just add Breaks for all of them. So our meta package linux-image-generic which
depends on linux-image-$(uname -r)-generic, we d simply add Breaks: linux-image-generic ( 5.19.0-31)
and then adjust those breaks for each series. This would have been super annoying, but
ultimately I figured this would be the safest option. This however caused concern, because
it could be that apt decides to remove the kernel metapackage.
I explored checking the kernels at runtime and aborting if we don t have a trusted
kernel in preinst. This ensures that if you try to upgrade shim without having a kernel,
it would fail to install. But this ultimately has a couple of issues:
It aborts the entire transaction at that point, so users will be unable to run
apt upgrade until they have a recent kernel.
We cannot even guarantee that a kernel would be unpacked first. So even if you got
a new kernel, apt/dpkg might attempt to unpack it first and then the preinst would fail
because no kernel is present yet.
Ultimately we believed the danger to be too large given that no kernels had yet been released
to users. If we had kernels pushed out for 1-2 months already, this would have been a viable
choice.
So in the end, I ended up modifying the shim packaging to install both the latest shim and
the previous one, and an update-alternatives alternative to select between the two:
In it s post-installation maintainer script, shim-signed checks whether all kernels with a
version greater or equal to the running one are not revoked, and if so, it will setup the
latest alternative with priority 100 and the previous with a priority of 50.
If one or more of those kernels was signed with a revoked key, it will swap the priorities
around, so that the previous version is preferred.
Now this is fairly static, and we do want you to switch to the latest shim eventually, so
I also added hooks to the kernel install to trigger the shim-signed postinst script when
a new kernel is being installed. It will then update the alternatives based on the current
set of kernels, and if it now points to the latest shim, reinstall shim and grub to the
ESP.
Ultimately this means that once you install your 2nd non-revoked kernel, or you install
a non-revoked kernel and then reconfigure shim or the kernel, you will get the latest
shim. When you install your first non-revoked kernel, your currently booted kernel is
still revoked, so it s not upgraded immediately. This has a benefit in that you will
most likely have two kernels you can boot without disabling secure boot.
regressions
Of course, the first version I uploaded had still some remaining hardcoded shimx64
in the scripts and so failed to install on arm64 where shimaa64 is used. And if that
were not enough, I also forgot to include support for gzip compressed kernels there.
Sigh, I need better testing infrastructure to be able to easily run arm64 tests as
well (I only tested the actual booting there, not the scripts).
shim-signed migrated to the release pocket in lunar fairly quickly, but this caused
images to stop working, because the new shim was installed into images, but no
kernel was available yet, so we had to demote it to proposed and block migration.
Despite all the work done for end users, we need to be careful to roll this out for
image building.
another grub update for OOM issues.
We had two grubs to release: First there was the security update for the recent set
of CVEs, then there also was an OOM issue for large initrds which was blocking critical
OEM work.
We fixed the OOM issue by cherry-picking all 2.12 memory management patches, as well
as the red hat patches to the loader we take from there. This ended up a fairly large
patch set and I was hesitant to tie the security update to that, so I ended up pushing
the security update everywhere first, and then pushed the OOM fixes this week.
With the OOM patches, you should be able to boot initrds of between 400M and 1GB, it
also depends on the memory layout of your machine and your screen resolution and background
images. So OEM team had success testing 400MB irl, and I tested up to I think it was 1.2GB
in qemu, I ran out of FAT space then and stopped going higher :D
other features in this round
Intel TDX support in grub and shim
Kernels are allocated as CODE now not DATA as per the upstream mm changes, might fix boot on X13s
am I using this yet?
The new signing keys are used in:
shim-signed 1.54 on 22.10+, 1.51.3 on 22.04, 1.40.9 on 20.04, 1.37~18.04.13 on 18.04
grub2-signed 1.187.2~ or newer (binary packages grub-efi-amd64-signed or grub-efi-arm64-signed),
1.192 on 23.04.
fwupd-signed 1.51~ or newer
various linux updates. Check apt changelog linux-image-unsigned-$(uname -r) to see if
Revoke & rotate to new signing key (LP: #2002812) is mentioned in there to see if it
signed with the new key.
If you were able to install shim-signed, your grub and fwupd-efi will have the correct
version as that is ensured by packaging. However your shim may still point to the old one.
To check which shim will be used by grub-install, you can check the status of the shimx64.efi.signed
or (on arm64) shimaa64.efi.signed alternative. The best link needs to point to the file ending in
latest:
$ update-alternatives --display shimx64.efi.signed
shimx64.efi.signed - auto mode
link best version is /usr/lib/shim/shimx64.efi.signed.latest
link currently points to /usr/lib/shim/shimx64.efi.signed.latest
link shimx64.efi.signed is /usr/lib/shim/shimx64.efi.signed
/usr/lib/shim/shimx64.efi.signed.latest - priority 100
/usr/lib/shim/shimx64.efi.signed.previous - priority 50
If it does not, but you have installed a new kernel compatible with the new shim, you can
switch immediately to the new shim after rebooting into the kernel by running dpkg-reconfigure shim-signed. You ll see in the output if the shim was updated, or you can check the output
of update-alternatives as you did above after the reconfiguration has finished.
For the out of memory issues in grub, you need grub2-signed 1.187.3~ (same binaries
as above).
how do I test this (while it s in proposed)?
upgrade your kernel to proposed and reboot into that
upgrade your grub-efi-amd64-signed, shim-signed, fwupd-signed to proposed.
If you already upgraded your shim before your kernel, don t worry:
upgrade your kernel and reboot
run dpkg-reconfigure shim-signed
And you ll be all good to go.
deep dive: uploading signed boot assets to Ubuntu
For each signed boot asset, we build one version in the latest stable release and the
development release. We then binary copy the built binaries from the latest stable release
to older stable releases. This process ensures two things: We know the next stable release
is able to build the assets and we also minimize the number of signed assets.
OK, I lied. For shim, we actually do not build in the development release but copy the
binaries upward from the latest stable, as each shim needs to go through external signing.
The entire workflow looks something like this:
Upload the unsigned package to one of the following build PPAs:
Copy the unsigned package back across all stable releases in the PPA
Upload the signed package for stable releases to the same PPA with ~<release>.1 appended to the version
Submit a request to canonical-signing-jobs to sign the uploads.
The signing job helper copies the binary -unsigned packages to the primary-2022v1 PPA where they are
signed, creating a signing tarball, then it copies the source package for the -signed package to the
same PPA which then downloads the signing tarball during build and places the signed assets into
the -signed deb.
Resulting binaries will be placed into the proposed PPA: https://launchpad.net/~ubuntu-uefi-team/+archive/ubuntu/proposed
Review the binaries themselves
Unembargo and binary copy the binaries from the proposed PPA to the proposed-public PPA: https://launchpad.net/~ubuntu-uefi-team/+archive/ubuntu/proposed-public.
This step is not strictly necessary, but it enables tools like sru-review to work, as they cannot access the packages from the normal private proposed PPA.
Binary copy from proposed-public to the proposed queue(s) in the primary archive
Lots of steps!
WIP
As of writing, only the grub updates have been released, other updates are still being
verified in proposed. An update for fwupd in bionic will be issued at a later point, removing
the EFI bits from the fwupd 1.2 packaging and using the separate fwupd-efi project instead
like later release series.
I read somewhere a nice meme about Linux: Do you want an operating system or do you want an adventure? I love
it, because it is so true. What you are about to read is my adventure to set a usable screen resolution in a fresh
Debian testing installation.
The context is that I have two different Lenovo Thinkpad laptops with 16 screen and nvidia graphic cards. They are both
installed with the latest Debian testing. I use the closed-source nvidia drivers (they seem to work better than the nouveau
module). The desktop manager and environment that I use is lightdm + XFCE4. The monitor native resolution in both machines
is very high: 3840x2160 (or 4K UHD if you will).
The thing is that both laptops show an identical problem: when freshly installed with the Debian default config,
the native resolution is in use. For a 16 screen laptop, this high resolution means that the font is tiny.
Therefore, the raw native resolution renders the machine almost unusable.
This is a picture of what you get by running htop in the console (tty1, the terminal you would get by
hitting CTRL+ALT+F1) with the default install:
Everything in the system is affected by this:
the grub menu is unreadable. Thanksfully the right option is selected by default.
the tty console, with the boot splash by systemd is unreadable as well. There are some colors, so you at least see some systemd stuff happening in green .
when lightdm starts, the resolution keeps being very high. Can barely click the login button.
when XFCE4 starts, it is a pain to navigate the menu and click the right buttons to set a more reasonable resolution.
The adventure begins after installing the system. Each of these four points must be fixed by hand by the user.
XFCE4
Point #4 is the easiest. Navigate with the mouse pointer to the tiny Applications menu, then Settings, then Displays.
This is more or less the same in every other desktop operating system. There are no further actions required to persist this
setting. Thanks you XFCE4.
lightdm
Point #3, about lightdm, is more tricky to solve. It involves running xrandr when lightdm sets up the display.
Nobody will tell you this trick. You have to search for it on the internet. Thankfully is a common problem, and a
person who knows what to search for can find good results.
The file /etc/lightdm/lightdm.conf needs to contain something like this:
[LightDM]
[Seat:*]
# set up correct display resolution
display-setup-script=sh -c -- "xrandr -s 1920x1080"
By the way, depending on your system hardware setup, you may also need an additional call to xrandr here. If you
want to plug in an HDMI monitor, chances are you require something like xrandr --setprovideroutputsource NVIDIA-G0 modesetting && xrandr --auto
to instruct the NVIDIA graphic card to work will with the kernel graphic system.
In my case, one of my laptops require it, so I have:
[LightDM]
[Seat:*]
# don't ask me to type my username
greeter-hide-users=false
# set up correct display resolution, and prepare NVIDIA card for HDMI output
display-setup-script=sh -c "xrandr -s 1920x1080 && xrandr --setprovideroutputsource NVIDIA-G0 modesetting && xrandr --auto"
grub
Point #1 about the grub menu is also not trivial to solve, but also widely known on the internet. Grub allows you to
set arbitrary graphical modes. In Debian systems, adding something like GRUB_GFXMODE=1024x768 to /etc/default/grub and then
running sudo update-grub should do the magic.
console
So we get to point #2 about the tty1 console. For months, I ve been investing my scarce personal time into trying to
solve this annoyance. There are a lot of conflicting information about this on the internet. Plenty of misleading solutions,
essays about framebuffer, kernel modeset, and other stuff I don t want to read just to get my tty1 in a readable status.
People point in different directions, like using GRUB_GFXPAYLOAD_LINUX=keep in /etc/default/grub. Which is a good solution,
but won t work: my best bet is that the kernel indeed keeps the resolution as told by grub, but the moment systemd loads the nvidia
driver, it enables 4K in the display and the console gets the high resolution.
Actually, for a few weeks, I blamed plymouth. Because the plymouth service is loaded early by
systemd, it could be responsible for setting some of the display settings. It actually contains some (undocummented)
DeviceScale configuration option that is seemingly aimed to integrate into high resolution scenarios. I played with it to no avail.
Some folks from IRC suggested reconfiguring the console-font package. Back-then unknown to me. Running
sudo dpkg-reconfigure console-font would indeed show a menu to select some preferences for the console, including font size.
But apparently, a freshly installed system already uses the biggest possible, so this was a dead end.
Other option I evaluted for a few days was touching the kernel framebuffer setting. I honestly don t understand this, and all the
solutions pointing to use fbset didn t work for me anyways. This is the default framebuffer configuration in one of the laptops:
Playing with these numbers, I was able to modify the geometry of the console, only to reduce the panel to a tiny square in the console
display (with equally small fonts anyway). If it was possible to scale or resize the panel in other way, I was unable to understand
how to do so by reading the associated docs.
One day, out of despair, I tried disabling kernel modesetting (or KMS). It indeed got me a more readable tty1, only to prevent
the whole graphic stack from starting, with Xorg complaining about the lack of kernel modeset.
After lots of wasted time, I decided to blame the NVIDIA graphic card. Because why not: a closed source module in my system looks fishy.
I registered in their official forum and wrote a message about my suspicion on the module, asking for advice on how
to modify the driver default resolution. I was hoping that something like modprobe nvidia my_desired_resolution=1920x1080 could
exist. Apparently not :-(
I was about to give up. I had walked every corner of the known internet. I even tried summoning the ancient gods, I used ChatGPT.
I asked the AI god for mercy, for a working solution to no avail.
Then I decided to change the kind of queries I was issuing the search engines (don t ask me, I no longer remember). Eventually I landed in
this askubuntu.com page. The question described the exact same problem I was experiencing. Finally, that was encouraging!
I was not alone in my adventure after all!
The solution section included a font size I hadn t seen before in my previous tests: 16x32. More excitement!
I did all the steps. I installed the xfonts-terminus package, and in the file /etc/default/console-setup I put:
Then I run setupcon from a tty, and the miracle happened! I finally got a bigger font in the tty1 console!
Turned out a potential solution was about playing with console-setup, which I had tried wihtout success before.
I m not even sure if the additional package was required.
This is how my console looks now:
The truth is the solution is satisfying only to a degree. I m a person with good eyesight and can work with
these bit larger fonts. I m not sure if I can get larger fonts using this method, honestly.
After some search, I discovered that some folks already managed to describe the problem in detail and
filed a proper bug report in Debian, see #595696 opened more than 10 years ago.
2023 is the year of linux on the desktop
Nope.
I honestly don t see how this disconnected pile of settings can be all reconciled together.
Can we please have a systemd-whatever that homogeinizes all of this mess?
I m referring to grub + kernel drivers + console + lightdm + XFCE4.
Next adventure
When I lock the desktop (with CTRL+ALT+L) and close the laptop lid to suspend it, then reopen it, type the login info
into the lightdm greeter, then the desktop environment never loads, black screen.
I have already tried the first few search results without luck. Perhaps the nvidia card is to blame this time? Perhaps
poorly coupled power management by the different system software pieces?
Who knows what s going on here. This will probably be my next Debian desktop adventure.
When rebuilding mozc with Mozc UT Dictionary, it may be better to build in docker container because you don't want install unused IM development packages.
In beforehand, download latest Mozc UT dictionary here.
osdn.net
In a debian/sid container, you need to do it:
% dpkg -l \grep mozc
ii emacs-mozc 2.28.4715.102+dfsg-2.2.1 amd64 Mozc for Emacs
ii emacs-mozc-bin 2.28.4715.102+dfsg-2.2.1 amd64 Helper module for emacs-mozc
ii fcitx-mozc-data 2.28.4715.102+dfsg-2.2.1 all Mozc input method - data files for fcitx
ii fcitx5-mozc:amd64 2.28.4715.102+dfsg-2.2.1 amd64 Mozc engine for fcitx5 - Client of the Mozc input method
ii ibus-mozc 2.28.4715.102+dfsg-2.2.1 amd64 Mozc engine for IBus - Client of the Mozc input method
ii mozc-data 2.28.4715.102+dfsg-2.2.1 all Mozc input method - data files
ii mozc-server 2.28.4715.102+dfsg-2.2.1 amd64 Server of the Mozc input method
ii mozc-utils-gui 2.28.4715.102+dfsg-2.2.1 amd64 GUI utilities of the Mozc input method
I m migrating some self-hosted virtual machines to Trisquel, and noticed that Trisquel does not offer cloud-images similar to the Debian Cloud and Ubuntu Cloud images. Thus my earlier approach based on virt-install --cloud-init and cloud-localds does not work with Trisquel. While I hope that Trisquel will eventually publish cloud-compatible images, I wanted to document an alternative approach for Trisquel based on preseeding. This is how I used to install Debian and Ubuntu in the old days, and the automated preseed method is best documented in the Debian installation manual. I was hoping to forget about the preseed format, but maybe it will become one of those legacy technologies that never really disappears? Like FAT16 and 8-bit microcontrollers.
Below I assume you have a virtual machine host server up that runs libvirt and has virt-install and similar tools; install them with the following command. I run a pre-release version of Trisquel 11 aramo on my VM-host, but I believe any recent dpkg-based distribution like Trisquel 9/10, PureOS 10, Debian 11 or Ubuntu 20.04/22.04 would work.
The approach can install Trisquel 9 (etiona), Trisquel 10 (nabia) and the pre-release of Trisquel 11. First download and verify the integrity of the netinst images that we will need. Unfortunately the Trisquel 11 netinst beta image does not have any checksum or signature available.
I have developed the following fairly minimal preseed file that works with all three Trisquel releases. Compare it against the official Trisquel 11 preseed skeleton and the Debian 11 example preseed file. You should modify obvious things like SSH key, host/IP settings, partition layout and decide for yourself how to deal with passwords. While Ubuntu/Trisquel usually wants to setup a user account, I prefer to login as root hence setting passwd/root-login to true and passwd/make-user to false.
Use the file above as a skeleton for preparing a VM-specific preseed file as follows. The environment variables HOST and IPS will be used later on too.
The following script is used to prepare the ISO images with the preseed file that we will need. This script is inspired by the Debian Wiki Preseed EditIso page and the Trisquel ISO customization wiki page. There are a couple of variations based on earlier works. Paths are updated to match the Trisquel netinst ISO layout, which differ slightly from Debian. We modify isolinux.cfg to boot the auto label without a timeout. On Trisquel 11 the auto boot label exists, but on Trisquel 9 and Trisquel 10 it does not exist so we add it in order to be able to start the automated preseed installation.
root@trana:~# cat gen-preseed-iso
#!/bin/sh
# Copyright (C) 2018-2022 Simon Josefsson -- GPLv3+
# https://wiki.debian.org/DebianInstaller/Preseed/EditIso
# https://trisquel.info/en/wiki/customizing-trisquel-iso
set -e
set -x
ISO="$1"
PRESEED="$2"
OUTISO="$3"
LASTPWD="$PWD"
test -f "$ISO"
test -f "$PRESEED"
test ! -f "$OUTISO"
TMPDIR=$(mktemp -d)
mkdir "$TMPDIR/mnt"
mkdir "$TMPDIR/tmp"
cp "$PRESEED" "$TMPDIR"/preseed.cfg
cd "$TMPDIR"
mount "$ISO" mnt/
cp -rT mnt/ tmp/
umount mnt/
chmod +w -R tmp/
gunzip tmp/initrd.gz
echo preseed.cfg cpio -H newc -o -A -F tmp/initrd
gzip tmp/initrd
chmod -w -R tmp/
sed -i "s/timeout 0/timeout 1/" tmp/isolinux.cfg
sed -i "s/default vesamenu.c32/default auto/" tmp/isolinux.cfg
if ! grep -q auto tmp/adtxt.cfg; then
cat<<EOF >> tmp/adtxt.cfg
label auto
menu label ^Automated install
kernel linux
append auto=true priority=critical vga=788 initrd=initrd.gz --- quiet
EOF
fi
cd tmp/
find -follow -type f xargs md5sum > md5sum.txt
cd ..
cd "$LASTPWD"
genisoimage -r -J -b isolinux.bin -c boot.cat \
-no-emul-boot -boot-load-size 4 -boot-info-table \
-o "$OUTISO" "$TMPDIR/tmp/"
rm -rf "$TMPDIR"
exit 0
^D
root@trana:~# chmod +x gen-preseed-iso
root@trana:~#
Next run the command on one of the downloaded ISO image and the generated preseed file.
root@trana:~# ./gen-preseed-iso /root/iso/trisquel-netinst_10.0.1_amd64.iso vm-$HOST.preseed vm-$HOST.iso
+ ISO=/root/iso/trisquel-netinst_10.0.1_amd64.iso
+ PRESEED=vm-foo.preseed
+ OUTISO=vm-foo.iso
+ LASTPWD=/root
+ test -f /root/iso/trisquel-netinst_10.0.1_amd64.iso
+ test -f vm-foo.preseed
+ test ! -f vm-foo.iso
+ mktemp -d
+ TMPDIR=/tmp/tmp.mNEprT4Tx9
+ mkdir /tmp/tmp.mNEprT4Tx9/mnt
+ mkdir /tmp/tmp.mNEprT4Tx9/tmp
+ cp vm-foo.preseed /tmp/tmp.mNEprT4Tx9/preseed.cfg
+ cd /tmp/tmp.mNEprT4Tx9
+ mount /root/iso/trisquel-netinst_10.0.1_amd64.iso mnt/
mount: /tmp/tmp.mNEprT4Tx9/mnt: WARNING: source write-protected, mounted read-only.
+ cp -rT mnt/ tmp/
+ umount mnt/
+ chmod +w -R tmp/
+ gunzip tmp/initrd.gz
+ echo preseed.cfg
+ cpio -H newc -o -A -F tmp/initrd
5 blocks
+ gzip tmp/initrd
+ chmod -w -R tmp/
+ sed -i s/timeout 0/timeout 1/ tmp/isolinux.cfg
+ sed -i s/default vesamenu.c32/default auto/ tmp/isolinux.cfg
+ grep -q auto tmp/adtxt.cfg
+ cat
+ cd tmp/
+ find -follow -type f
+ xargs md5sum
+ cd ..
+ cd /root
+ genisoimage -r -J -b isolinux.bin -c boot.cat -no-emul-boot -boot-load-size 4 -boot-info-table -o vm-foo.iso /tmp/tmp.mNEprT4Tx9/tmp/
I: -input-charset not specified, using utf-8 (detected in locale settings)
Using GCRY_000.MOD;1 for /tmp/tmp.mNEprT4Tx9/tmp/boot/grub/x86_64-efi/gcry_sha512.mod (gcry_sha256.mod)
Using XNU_U000.MOD;1 for /tmp/tmp.mNEprT4Tx9/tmp/boot/grub/x86_64-efi/xnu_uuid.mod (xnu_uuid_test.mod)
Using PASSW000.MOD;1 for /tmp/tmp.mNEprT4Tx9/tmp/boot/grub/x86_64-efi/password_pbkdf2.mod (password.mod)
Using PART_000.MOD;1 for /tmp/tmp.mNEprT4Tx9/tmp/boot/grub/x86_64-efi/part_sunpc.mod (part_sun.mod)
Using USBSE000.MOD;1 for /tmp/tmp.mNEprT4Tx9/tmp/boot/grub/x86_64-efi/usbserial_pl2303.mod (usbserial_ftdi.mod)
Using USBSE001.MOD;1 for /tmp/tmp.mNEprT4Tx9/tmp/boot/grub/x86_64-efi/usbserial_ftdi.mod (usbserial_usbdebug.mod)
Using VIDEO000.MOD;1 for /tmp/tmp.mNEprT4Tx9/tmp/boot/grub/x86_64-efi/videotest.mod (videotest_checksum.mod)
Using GFXTE000.MOD;1 for /tmp/tmp.mNEprT4Tx9/tmp/boot/grub/x86_64-efi/gfxterm_background.mod (gfxterm_menu.mod)
Using GCRY_001.MOD;1 for /tmp/tmp.mNEprT4Tx9/tmp/boot/grub/x86_64-efi/gcry_sha256.mod (gcry_sha1.mod)
Using MULTI000.MOD;1 for /tmp/tmp.mNEprT4Tx9/tmp/boot/grub/x86_64-efi/multiboot2.mod (multiboot.mod)
Using USBSE002.MOD;1 for /tmp/tmp.mNEprT4Tx9/tmp/boot/grub/x86_64-efi/usbserial_usbdebug.mod (usbserial_common.mod)
Using MDRAI000.MOD;1 for /tmp/tmp.mNEprT4Tx9/tmp/boot/grub/x86_64-efi/mdraid09.mod (mdraid09_be.mod)
Size of boot image is 4 sectors -> No emulation
22.89% done, estimate finish Thu Dec 29 23:36:18 2022
45.70% done, estimate finish Thu Dec 29 23:36:18 2022
68.56% done, estimate finish Thu Dec 29 23:36:18 2022
91.45% done, estimate finish Thu Dec 29 23:36:18 2022
Total translation table size: 2048
Total rockridge attributes bytes: 24816
Total directory bytes: 40960
Path table size(bytes): 64
Max brk space used 46000
21885 extents written (42 MB)
+ rm -rf /tmp/tmp.mNEprT4Tx9
+ exit 0
root@trana:~#
Now the image is ready for installation, so invoke virt-install as follows. The machine will start directly, launching the preseed automatic installation. At this point, I usually click on the virtual machine in virt-manager to follow screen output until the installation has finished. If everything works OK the machines comes up and I can ssh into it.
root@trana:~# virt-install --name $HOST --disk vm-$HOST.img,size=5 --cdrom vm-$HOST.iso --osinfo linux2020 --autostart --noautoconsole --wait
Using linux2020 default --memory 4096
Starting install...
Allocating 'vm-foo.img' 0 B 00:00:00 ...
Creating domain... 0 B 00:00:00
Domain is still running. Installation may be in progress.
Waiting for the installation to complete.
Domain has shutdown. Continuing.
Domain creation completed.
Restarting guest.
root@trana:~#
There are some problems that I have noticed that would be nice to fix, but are easy to work around. The first is that at the end of the installation of Trisquel 9 and Trisquel 10, the VM hangs after displaying Sent SIGKILL to all processes followed by Requesting system reboot. I kill the VM manually using virsh destroy foo and start it up again using virsh start foo. For production use I expect to be running Trisquel 11, where the problem doesn t happen, so this does not bother me enough to debug further. The remaining issue that once booted, a Trisquel 11 VM has lost its DNS nameserver configuration, presumably due to poor integration with systemd-resolved. Both Trisquel 9 and Trisquel 10 uses systemd-resolved where DNS works after first boot, so this appears to be a Trisquel 11 bug. You can work around it with rm -f /etc/resolv.conf && echo 'nameserver A.B.C.D' > /etc/resolv.conf or drink the systemd Kool-Aid. If you want to clean up and re-start the process, here is how you wipe out what you did. After this, you may run the sed, ./gen-preseed-iso and virt-install commands again. Remember, use virsh shutdown foo to gracefully shutdown a VM.
I had bought a Thinkpad E470 laptop back in 2018 which was lying unused for
quite some time. Recently when I wanted to use it, I found that the keyboard is
not working, especially some keys and after some time the laptop will hang in
Lenovo boot screen. I came back to Bangalore almost after 2 years from my
hometown (WFH due to Covid) and thought it was the right time to get my laptop
back to normal working state. After getting the keyboard replaced I noticed that
1TB HDD is no longer fast enough for my taste!. I've to admit I never thought I
would start disliking HDD so quickly thanks to modern SSD based work laptops. So
as a second upgrade I got the HDD removed from my laptop and got a 240G SSD.
Yeah I know its reduction from my original size but I intend to continue using
my old HDD via USB SATA enclosure as an external HDD which can house the extra
data which I need to save.
So now that I've a SSD I need to install Debian Unstable again on it and this is
where I tried something new. My colleague (name redacted on request) suggested
to me use GRML live CD and install Debian via debootstrap. And after giving a
thought I decided to try this out. Some reason for going ahead with this are
listed below
Debian Installer does not support a proper BTRFS based root file system. It
just allows btrfs as root but no subvolume support. Also I'm not sure about
the luks support with btrfs as root.
I also wanted to give a try to systemd-boot as my laptop is UEFI capable and
I've slowly started disliking Grub.
I really hate installing task-kde-desktop (Yeah you read it right, I've switched
to be a KDE user for quite some time) which will pull tons of unwanted stuff
and bloat. Well it's not just task-kde-desktop but any other task-desktop
package does similar and I don't want to have too much of unused stuff and
services running.
Disk Preparation
As a first step I went to GRML website and downloaded current pre-release. Frankly, I'm using GRML for first
time and I was not sure what to expect. When I booted it up I was bit taken a
back to see its console based and I did not have a wired lan just a plain
wireless dongle (Jiofi device) and was wondering what it will take to connect.
But surprisingly curses based UI was pretty much straight forward to allow me to
connect to Wifi AP. Another thing was the rescue CD had non-free firmware as the
laptop was using ath10k device and needed non-free blobs to operate.
Once I got shell prompt in rescue CD first thing I did was to reconfigure
console-setup to increase font size which was very very small on default boot.
Once that is done I did the following to create a 1G (FAT32) partition for EFI.
parted -a optimal -s /dev/sda mklabel gpt
parted -a optimal -s /dev/sda mkpart primary vfat 0% 1G
parted -a optimal -s /dev/sda set1 esp on
mkfs.vfat -n boot_disk -F 32 /dev/sda1
So here is what I did: created a 1G vfat type partition and set the esp flag on
it. This will be mounted to /boot/efi for systemd-boot. Next I created a single
partition on the rest of the available free disk which will be used as the root
file system.
Next I encrypted the root parition using LUKS and then created the BTRFS file
system on top of it.
Next is to create subvolumes in BTRFS. I followed suggestion by colleague and
created a top-level @ as subvolume below which created @/home@/var/log@/opt .
Also enabled compression with zstd and level of 1 to avoid battery drain.
Finally marked the @ as default subvolume to avoid adding it to fstab entry.
mount -o compress=zstd:1 /dev/mapper/ENC /mnt
btrfs subvol create /mnt/@
cd /mnt/@
btrfs subvol create ./home
btrfs subvol create ./opt
mkdir -p var
btrfs subvol create ./var/log
btrfs suvol set-default /mnt/@
Bootstrapping Debian
Now that root disk is prepared next step was to bootstrap the root file system.
I used debootstrap for this job. One thing I missed here from installer was
ability to preseed. I tried looking around to figure out if we can preseed
debootstrap but did not find much. If you know the procedure do point it to me.
cd /mnt/
debootstrap --include=dbus,locales,tzdata unstable @/ http://deb.debian.org/debian
Well this just gets a bare minimal installation of Debian I need to install rest
of the things post this step manually by chroot into target folder @/.
I like the grml-chroot command for chroot purpose, it does most of the job of
mounting all required directory like /dev/ /proc /sys etc. But before entering
chroot I need to mount the ESP partition we created to /boot/efi so that I
can finalize the installation of kernel and systemd-boot.
umount /mnt
mount -o compress=zstd:1 /dev/mapper/ENC /mnt
mkdir -p /mnt/boot/efi
mount /dev/sda1 /mnt/boot/efi
grml-chroot /mnt /bin/bash
I remounted the root subvolume @ directly to /mnt now, remember I made @ as
default subvolume before. I also mounted ESP partition with FAT32 file system to
/boot/efi. Finally I used grml-chroot to get into chroot of newly bootstrapped
file system.
Now I will install the kernel and minimal KDE desktop installation and configure
locales and time zone data for the new system. I wanted to use dracut instead of
default initramfs-tools for initrd. I also need to install cryptsetup and
btrfs-progs so I can decrypt and really boot into my new system.
I've not written actual UUID above this is just for the purpose of showing the
content of /etc/crypttab. Once these entries are added we need to recreate
initrd. I just reconfigured the installed kernel package for retriggerring the
recreation of initrd using dracut.
..
Reconfiguration was locales is done by editing /etc/locales.gen to uncomment
en_US.UTF-8 and writing /etc/timezone with Asia/Kolkata. I used
DEBIAN_FRONTEND=noninteractive to avoid another prompt asking for locale and
timezone information.
Added my user using adduser command and also set the root password as well.
Added my user to sudo group so I can use sudo to elevate privileges.
Setting up systemd-boot
So now basic usable system is ready last part is enabling the systemd-boot
configuration as I'm not gonna use grub. I did following to install
systemd-boot. Frankly I'm not expert of this it was colleague's suggestion.
Before installing the systemd-boot I had to setup kernel command line. This can
be done by writing command line to /etc/kernel/cmdline with following contents.
systemd.gpt_auto=no quiet root=LABEL=root_disk
I'm disabling systemd-gpt-generator to avoid race condition between crypttab
entry and auto generated entry by systemd. I faced this mainly because of my
stupidity of not adding entry root=LABEL=root_disk
Finally exit from the chroot and reboot into the freshly installed system.
systemd-boot already ships a hook file zz-systemd-boot under /etc/kernel
so its pretty much usable without any manual intervention. Previously after
kernel installation we had to manually update kernel image in efi partitions
using bootctl
Conclussion
Though installing from live image is not new and debian-installer also does the
same only difference is more control over installation and doing things which is
installer is not letting you do (or should I say is not part of default
installation?). If properly automated using scripts we can leverage this to do
custom installation in large scale environments. I know there is FAI but I've
not explored it and felt there is too much to setup for a simple installations
with specific requirements.
So finally I've a system with Debian which differs from default Debian
installation :-). I should thank my colleague for rekindling nerd inside me who
had stopped experimenting quite a long time back.
In my tubman setup, I started using ZFS on an old server
I had lying around. The machine is really old though (2011!) and it
"feels" pretty slow. I want to see how much of that is ZFS and how
much is the machine. Synthetic benchmarks show that ZFS may be slower
than mdadm in RAID-10 or RAID-6 configuration, so I want to confirm
that on a live workload: my workstation. Plus, I want easy, regular,
high performance backups (with send/receive snapshots) and there's no
way I'm going to use BTRFS because I find
it too confusing and unreliable.
So off we go.
Installation
Since this is a conversion (and not a new install), our procedure is
slightly different than the official documentation but otherwise
it's pretty much in the same spirit: we're going to use ZFS for
everything, including the root filesystem.
So, install the required packages, on the current system:
root@curie:/home/anarcat# sgdisk -p /dev/sdc
Disk /dev/sdc: 1953525168 sectors, 931.5 GiB
Model: ESD-S1C
Sector size (logical/physical): 512/512 bytes
Disk identifier (GUID): [REDACTED]
Partition table holds up to 128 entries
Main partition table begins at sector 2 and ends at sector 33
First usable sector is 34, last usable sector is 1953525134
Partitions will be aligned on 16-sector boundaries
Total free space is 14 sectors (7.0 KiB)
Number Start (sector) End (sector) Size Code Name
1 48 2047 1000.0 KiB EF02
2 2048 1050623 512.0 MiB EF00
3 1050624 3147775 1024.0 MiB BF01
4 3147776 1953525134 930.0 GiB BF00
Unfortunately, we can't be sure of the sector size here, because the
USB controller is probably lying to us about it. Normally, this
smartctl command should tell us the sector size as well:
root@curie:~# smartctl -i /dev/sdb -qnoserial
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.10.0-14-amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Western Digital Black Mobile
Device Model: WDC WD10JPLX-00MBPT0
Firmware Version: 01.01H01
User Capacity: 1 000 204 886 016 bytes [1,00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 7200 rpm
Form Factor: 2.5 inches
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA8-ACS T13/1699-D revision 6
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Tue May 17 13:33:04 2022 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
Above is the example of the builtin HDD drive. But the SSD device
enclosed in that USB controller doesn't support SMART commands,
so we can't trust that it really has 512 bytes sectors.
This matters because we need to tweak the ashift value
correctly. We're going to go ahead the SSD drive has the common 4KB
settings, which means ashift=12.
Note here that we are not creating a separate partition for
swap. Swap on ZFS volumes (AKA "swap on ZVOL") can trigger lockups and
that issue is still not fixed upstream. Ubuntu recommends using a
separate partition for swap instead. But since this is "just" a
workstation, we're betting that we will not suffer from this problem,
after hearing a report from another Debian developer running this
setup on their workstation successfully.
We do not recommend this setup though. In fact, if I were to redo this
partition scheme, I would probably use LUKS encryption and setup a
dedicated swap partition, as I had problems with ZFS encryption as
well.
Creating pools
ZFS pools are somewhat like "volume groups" if you are familiar with
LVM, except they obviously also do things like RAID-10. (Even though
LVM can technically also do RAID, people typically use mdadm
instead.)
In any case, the guide suggests creating two different pools here:
one, in cleartext, for boot, and a separate, encrypted one, for the
rest. Technically, the boot partition is required because the Grub
bootloader only supports readonly ZFS pools, from what I
understand. But I'm a little out of my depth here and just following
the guide.
Boot pool creation
This creates the boot pool in readonly mode with features that grub
supports:
-O encryption=on -O keylocation=prompt -O keyformat=passphrase:
encryption, prompt for a password, default algorithm is
aes-256-gcm, explicit in the guide, made implicit here
-O acltype=posixacl -O xattr=sa: enable ACLs, with better
performance (not enabled by default)
-O dnodesize=auto: related to extended attributes, less
compatibility with other implementations
-O compression=zstd: enable zstd compression, can be
disabled/enabled by dataset to with zfs set compression=off
rpool/example
-O relatime=on: classic atime optimisation, another that could
be used on a busy server is atime=off
-O canmount=off: do not make the pool mount automatically with
mount -a?
-O mountpoint=/ -R /mnt: mount pool on / in the future, but
/mnt for now
Those settings are all available in zfsprops(8). Other flags are
defined in zpool-create(8). The reasoning behind them is also
explained in the upstream guide and some also in [the Debian
wiki][]. Those flags were actually not used:
-O normalization=formD: normalize file names on comparisons (not
storage), implies utf8only=on, which is a bad idea (and
effectively meant my first sync failed to copy some files,
including this folder from a supysonic checkout). and this
cannot be changed after the filesystem is created. bad, bad, bad.
Side note about single-disk pools
Also note that we're living dangerously here: single-disk ZFS pools
are rumoured to be more dangerous than not running ZFS at all. The
choice quote from this article is:
[...] any error can be detected, but cannot be corrected. This
sounds like an acceptable compromise, but its actually not. The
reason its not is that ZFS' metadata cannot be allowed to be
corrupted. If it is it is likely the zpool will be impossible to
mount (and will probably crash the system once the corruption is
found). So a couple of bad sectors in the right place will mean that
all data on the zpool will be lost. Not some, all. Also there's no
ZFS recovery tools, so you cannot recover any data on the drives.
Compared with (say) ext4, where a single disk error can recovered,
this is pretty bad. But we are ready to live with this with the idea
that we'll have hourly offline snapshots that we can easily recover
from. It's trade-off. Also, we're running this on a NVMe/M.2 drive
which typically just blinks out of existence completely, and doesn't
"bit rot" the way a HDD would.
Also, the FreeBSD handbook quick start doesn't have any warnings
about their first example, which is with a single disk. So I am
reassured at least.
Creating mount points
Next we create the actual filesystems, known as "datasets" which are
the things that get mounted on mountpoint and hold the actual files.
Note that it's unclear to me why those datasets are necessary, but
they seem common practice, also used in this FreeBSD
example. The OpenZFS guide mentions the Solaris upgrades and
Ubuntu's zsys that use that container for upgrades and rollbacks.
This blog post seems to explain a bit the layout behind the
installer.
this creates the actual boot and root filesystems:
Notice here a peculiarity: we must create rpool/var/lib to
create rpool/var/lib/docker otherwise we get this error:
cannot create 'rpool/var/lib/docker': parent does not exist
... and no, just creating /mnt/var/lib doesn't fix that
problem. In fact, it makes things even more confusing because an
existing directory shadows a mountpoint, which is the opposite of
how things normally work.
Also note that you will probably need to change storage driver in
Docker, see the zfs-driver documentation for details but,
basically, I did:
Now that we have everything setup and mounted, let's copy all files
over.
Copying files
This is a list of all the mounted filesystems
for fs in /boot/ /boot/efi/ / /home/; do
echo "syncing $fs to /mnt$fs..." &&
rsync -aSHAXx --info=progress2 --delete $fs /mnt$fs
done
You can check that the list is correct with:
mount -l -t ext4,btrfs,vfat awk ' print $3 '
Note that we skip /srv as it's on a different disk.
On the first run, we had:
root@curie:~# for fs in /boot/ /boot/efi/ / /home/; do
echo "syncing $fs to /mnt$fs..." &&
rsync -aSHAXx --info=progress2 $fs /mnt$fs
done
syncing /boot/ to /mnt/boot/...
0 0% 0.00kB/s 0:00:00 (xfr#0, to-chk=0/299)
syncing /boot/efi/ to /mnt/boot/efi/...
16,831,437 100% 184.14MB/s 0:00:00 (xfr#101, to-chk=0/110)
syncing / to /mnt/...
28,019,293,280 94% 47.63MB/s 0:09:21 (xfr#703710, ir-chk=6748/839220)rsync: [generator] delete_file: rmdir(var/lib/docker) failed: Device or resource busy (16)
could not make way for new symlink: var/lib/docker
34,081,267,990 98% 50.71MB/s 0:10:40 (xfr#736577, to-chk=0/867732)
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1333) [sender=3.2.3]
syncing /home/ to /mnt/home/...
rsync: [sender] readlink_stat("/home/anarcat/.fuse") failed: Permission denied (13)
24,456,268,098 98% 68.03MB/s 0:05:42 (xfr#159867, ir-chk=6875/172377)
file has vanished: "/home/anarcat/.cache/mozilla/firefox/s2hwvqbu.quantum/cache2/entries/B3AB0CDA9C4454B3C1197E5A22669DF8EE849D90"
199,762,528,125 93% 74.82MB/s 0:42:26 (xfr#1437846, ir-chk=1018/1983979)rsync: [generator] recv_generator: mkdir "/mnt/home/anarcat/dist/supysonic/tests/assets/\#346" failed: Invalid or incomplete multibyte or wide character (84)
*** Skipping any contents from this failed directory ***
315,384,723,978 96% 76.82MB/s 1:05:15 (xfr#2256473, to-chk=0/2993950)
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1333) [sender=3.2.3]
Note the failure to transfer that supysonic file? It turns out they
had a weird filename in their source tree, since then removed,
but still it showed how the utf8only feature might not be such a bad
idea. At this point, the procedure was restarted all the way back to
"Creating pools", after unmounting all ZFS filesystems (umount
/mnt/run /mnt/boot/efi && umount -t zfs -a) and destroying the pool,
which, surprisingly, doesn't require any confirmation (zpool destroy
rpool).
The second run was cleaner:
root@curie:~# for fs in /boot/ /boot/efi/ / /home/; do
echo "syncing $fs to /mnt$fs..." &&
rsync -aSHAXx --info=progress2 --delete $fs /mnt$fs
done
syncing /boot/ to /mnt/boot/...
0 0% 0.00kB/s 0:00:00 (xfr#0, to-chk=0/299)
syncing /boot/efi/ to /mnt/boot/efi/...
0 0% 0.00kB/s 0:00:00 (xfr#0, to-chk=0/110)
syncing / to /mnt/...
28,019,033,070 97% 42.03MB/s 0:10:35 (xfr#703671, ir-chk=1093/833515)rsync: [generator] delete_file: rmdir(var/lib/docker) failed: Device or resource busy (16)
could not make way for new symlink: var/lib/docker
34,081,807,102 98% 44.84MB/s 0:12:04 (xfr#736580, to-chk=0/867723)
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1333) [sender=3.2.3]
syncing /home/ to /mnt/home/...
rsync: [sender] readlink_stat("/home/anarcat/.fuse") failed: Permission denied (13)
IO error encountered -- skipping file deletion
24,043,086,450 96% 62.03MB/s 0:06:09 (xfr#151819, ir-chk=15117/172571)
file has vanished: "/home/anarcat/.cache/mozilla/firefox/s2hwvqbu.quantum/cache2/entries/4C1FDBFEA976FF924D062FB990B24B897A77B84B"
315,423,626,507 96% 67.09MB/s 1:14:43 (xfr#2256845, to-chk=0/2994364)
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1333) [sender=3.2.3]
Also note the transfer speed: we seem capped at 76MB/s, or
608Mbit/s. This is not as fast as I was expecting: the USB connection
seems to be at around 5Gbps:
anarcat@curie:~$ lsusb -tv head -4
/: Bus 02.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/6p, 5000M
ID 1d6b:0003 Linux Foundation 3.0 root hub
__ Port 1: Dev 4, If 0, Class=Mass Storage, Driver=uas, 5000M
ID 0b05:1932 ASUSTek Computer, Inc.
So it shouldn't cap at that speed. It's possible the USB adapter is
failing to give me the full speed though. It's not the M.2 SSD drive
either, as that has a ~500MB/s bandwidth, acccording to its spec.
At this point, we're about ready to do the final configuration. We
drop to single user mode and do the rest of the procedure. That used
to be shutdown now, but it seems like the systemd switch broke that,
so now you can reboot into grub and pick the "recovery"
option. Alternatively, you might try systemctl rescue, as I found
out.
I also wanted to copy the drive over to another new NVMe drive, but
that failed: it looks like the USB controller I have doesn't work with
older, non-NVME drives.
Boot configuration
Now we need to enter the new system to rebuild the boot loader and
initrd and so on.
First, we bind mounts and chroot into the ZFS disk:
mount --rbind /dev /mnt/dev &&
mount --rbind /proc /mnt/proc &&
mount --rbind /sys /mnt/sys &&
chroot /mnt /bin/bash
Next we add an extra service that imports the bpool on boot, to make
sure it survives a zpool.cache destruction:
I had to trim down /etc/fstab and /etc/crypttab to only contain
references to the legacy filesystems (/srv is still BTRFS!).
If we don't already have a tmpfs defined in /etc/fstab:
Strangely, that's not exactly what the author, Jim Salter, did in
his actual test bench used in the ZFS benchmarking
article. The first thing is there's no read test at all, which is
already pretty strange. But also it doesn't include stuff like
dropping caches or repeating results.
So here's my variation, which i called fio-ars-bench.sh for
now. It just batches a bunch of fio tests, one by one, 60 seconds
each. It should take about 12 minutes to run, as there are 3 pair of
tests, read/write, with and without async.
My bias, before building, running and analysing those results is that
ZFS should outperform the traditional stack on writes, but possibly
not on reads. It's also possible it outperforms it on both, because
it's a newer drive. A new test might be possible with a new external
USB drive as well, although I doubt I will find the time to do this.
Results
All tests were done on WD blue SN550 drives, which claims to be
able to push 2400MB/s read and 1750MB/s write. An extra drive was
bought to move the LVM setup from a WDC WDS500G1B0B-00AS40 SSD, a WD
blue M.2 2280 SSD that was at least 5 years old, spec'd at 560MB/s
read, 530MB/s write. Benchmarks were done on the M.2 SSD drive but
discarded so that the drive difference is not a factor in the test.
In practice, I'm going to assume we'll never reach those numbers
because we're not actually NVMe (this is an old workstation!) so the
bottleneck isn't the disk itself. For our purposes, it might still
give us useful results.
Rescue test, LUKS/LVM/ext4
Those tests were performed with everything shutdown, after either
entering the system in rescue mode, or by reaching that target with:
systemctl rescue
The network might have been started before or after the test as well:
systemctl start systemd-networkd
So it should be fairly reliable as basically nothing else is running.
Raw numbers, from the ?job-curie-lvm.log, converted to MiB/s and
manually merged:
test
read I/O
read IOPS
write I/O
write IOPS
rand4k4g1x
39.27
10052
212.15
54310
rand4k4g1x--fsync=1
39.29
10057
2.73
699
rand64k256m16x
1297.00
20751
1068.57
17097
rand64k256m16x--fsync=1
1290.90
20654
353.82
5661
rand1m16g1x
315.15
315
563.77
563
rand1m16g1x--fsync=1
345.88
345
157.01
157
Peaks are at about 20k IOPS and ~1.3GiB/s read, 1GiB/s write in the
64KB blocks with 16 jobs.
Slowest is the random 4k block sync write at an abysmal 3MB/s and 700
IOPS The 1MB read/write tests have lower IOPS, but that is expected.
Rescue test, ZFS
This test was also performed in rescue mode.
Raw numbers, from the ?job-curie-zfs.log, converted to MiB/s and
manually merged:
test
read I/O
read IOPS
write I/O
write IOPS
rand4k4g1x
77.20
19763
27.13
6944
rand4k4g1x--fsync=1
76.16
19495
6.53
1673
rand64k256m16x
1882.40
30118
70.58
1129
rand64k256m16x--fsync=1
1865.13
29842
71.98
1151
rand1m16g1x
921.62
921
102.21
102
rand1m16g1x--fsync=1
908.37
908
64.30
64
Peaks are at 1.8GiB/s read, also in the 64k job like above, but much
faster. The write is, as expected, much slower at 70MiB/s (compared
to 1GiB/s!), but it should be noted the sync write doesn't degrade
performance compared to async writes (although it's still below the
LVM 300MB/s).
Conclusions
Really, ZFS has trouble performing in all write conditions. The
random 4k sync write test is the only place where ZFS outperforms
LVM in writes, and barely (7MiB/s vs 3MiB/s). Everywhere else, writes
are much slower, sometimes by an order of magnitude.
And before some ZFS zealot jumps in talking about the SLOG or some
other cache that could be added to improved performance, I'll remind
you that those numbers are on a bare bones NVMe drive, pretty much as
fast storage as you can find on this machine. Adding another NVMe
drive as a cache probably will not improve write performance here.
Still, those are very different results than the tests performed by
Salter which shows ZFS beating traditional configurations in all
categories but uncached 4k reads (not writes!). That said, those tests
are very different from the tests I performed here, where I test
writes on a single disk, not a RAID array, which might explain the
discrepancy.
Also, note that neither LVM or ZFS manage to reach the 2400MB/s read
and 1750MB/s write performance specification. ZFS does manage to reach
82% of the read performance (1973MB/s) and LVM 64% of the write
performance (1120MB/s). LVM hits 57% of the read performance and ZFS
hits barely 6% of the write performance.
Overall, I'm a bit disappointed in the ZFS write performance here, I
must say. Maybe I need to tweak the record size or some other ZFS
voodoo, but I'll note that I didn't have to do any such configuration
on the other side to kick ZFS in the pants...
Real world experience
This section document not synthetic backups, but actual real world
workloads, comparing before and after I switched my workstation to
ZFS.
Docker performance
I had the feeling that running some git hook (which was firing a
Docker container) was "slower" somehow. It seems that, at runtime, ZFS
backends are significant slower than their overlayfs/ext4 equivalent:
May 16 14:42:52 curie systemd[1]: home-docker-overlay2-17e4d24228decc2d2d493efc401dbfb7ac29739da0e46775e122078d9daf3e87\x2dinit-merged.mount: Succeeded.
May 16 14:42:52 curie systemd[5161]: home-docker-overlay2-17e4d24228decc2d2d493efc401dbfb7ac29739da0e46775e122078d9daf3e87\x2dinit-merged.mount: Succeeded.
May 16 14:42:52 curie systemd[1]: home-docker-overlay2-17e4d24228decc2d2d493efc401dbfb7ac29739da0e46775e122078d9daf3e87-merged.mount: Succeeded.
May 16 14:42:53 curie dockerd[1723]: time="2022-05-16T14:42:53.087219426-04:00" level=info msg="starting signal loop" namespace=moby path=/run/docker/containerd/daemon/io.containerd.runtime.v2.task/moby/af22586fba07014a4d10ab19da10cf280db7a43cad804d6c1e9f2682f12b5f10 pid=151170
May 16 14:42:53 curie systemd[1]: Started libcontainer container af22586fba07014a4d10ab19da10cf280db7a43cad804d6c1e9f2682f12b5f10.
May 16 14:42:54 curie systemd[1]: docker-af22586fba07014a4d10ab19da10cf280db7a43cad804d6c1e9f2682f12b5f10.scope: Succeeded.
May 16 14:42:54 curie dockerd[1723]: time="2022-05-16T14:42:54.047297800-04:00" level=info msg="shim disconnected" id=af22586fba07014a4d10ab19da10cf280db7a43cad804d6c1e9f2682f12b5f10
May 16 14:42:54 curie dockerd[998]: time="2022-05-16T14:42:54.051365015-04:00" level=info msg="ignoring event" container=af22586fba07014a4d10ab19da10cf280db7a43cad804d6c1e9f2682f12b5f10 module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
May 16 14:42:54 curie systemd[2444]: run-docker-netns-f5453c87c879.mount: Succeeded.
May 16 14:42:54 curie systemd[5161]: run-docker-netns-f5453c87c879.mount: Succeeded.
May 16 14:42:54 curie systemd[2444]: home-docker-overlay2-17e4d24228decc2d2d493efc401dbfb7ac29739da0e46775e122078d9daf3e87-merged.mount: Succeeded.
May 16 14:42:54 curie systemd[5161]: home-docker-overlay2-17e4d24228decc2d2d493efc401dbfb7ac29739da0e46775e122078d9daf3e87-merged.mount: Succeeded.
May 16 14:42:54 curie systemd[1]: run-docker-netns-f5453c87c879.mount: Succeeded.
May 16 14:42:54 curie systemd[1]: home-docker-overlay2-17e4d24228decc2d2d493efc401dbfb7ac29739da0e46775e122078d9daf3e87-merged.mount: Succeeded.
Translating this:
container setup: ~1 second
container runtime: ~1 second
container teardown: ~1 second
total runtime: 2-3 seconds
Obviously, those timestamps are not quite accurate enough to make
precise measurements...
After I switched to ZFS:
mai 30 15:31:39 curie systemd[1]: var-lib-docker-zfs-graph-41ce08fb7a1d3a9c101694b82722f5621c0b4819bd1d9f070933fd1e00543cdf\x2dinit.mount: Succeeded.
mai 30 15:31:39 curie systemd[5287]: var-lib-docker-zfs-graph-41ce08fb7a1d3a9c101694b82722f5621c0b4819bd1d9f070933fd1e00543cdf\x2dinit.mount: Succeeded.
mai 30 15:31:40 curie systemd[1]: var-lib-docker-zfs-graph-41ce08fb7a1d3a9c101694b82722f5621c0b4819bd1d9f070933fd1e00543cdf.mount: Succeeded.
mai 30 15:31:40 curie systemd[5287]: var-lib-docker-zfs-graph-41ce08fb7a1d3a9c101694b82722f5621c0b4819bd1d9f070933fd1e00543cdf.mount: Succeeded.
mai 30 15:31:41 curie dockerd[3199]: time="2022-05-30T15:31:41.551403693-04:00" level=info msg="starting signal loop" namespace=moby path=/run/docker/containerd/daemon/io.containerd.runtime.v2.task/moby/42a1a1ed5912a7227148e997f442e7ab2e5cc3558aa3471548223c5888c9b142 pid=141080
mai 30 15:31:41 curie systemd[1]: run-docker-runtime\x2drunc-moby-42a1a1ed5912a7227148e997f442e7ab2e5cc3558aa3471548223c5888c9b142-runc.ZVcjvl.mount: Succeeded.
mai 30 15:31:41 curie systemd[5287]: run-docker-runtime\x2drunc-moby-42a1a1ed5912a7227148e997f442e7ab2e5cc3558aa3471548223c5888c9b142-runc.ZVcjvl.mount: Succeeded.
mai 30 15:31:41 curie systemd[1]: Started libcontainer container 42a1a1ed5912a7227148e997f442e7ab2e5cc3558aa3471548223c5888c9b142.
mai 30 15:31:45 curie systemd[1]: docker-42a1a1ed5912a7227148e997f442e7ab2e5cc3558aa3471548223c5888c9b142.scope: Succeeded.
mai 30 15:31:45 curie dockerd[3199]: time="2022-05-30T15:31:45.883019128-04:00" level=info msg="shim disconnected" id=42a1a1ed5912a7227148e997f442e7ab2e5cc3558aa3471548223c5888c9b142
mai 30 15:31:45 curie dockerd[1726]: time="2022-05-30T15:31:45.883064491-04:00" level=info msg="ignoring event" container=42a1a1ed5912a7227148e997f442e7ab2e5cc3558aa3471548223c5888c9b142 module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
mai 30 15:31:45 curie systemd[1]: run-docker-netns-e45f5cf5f465.mount: Succeeded.
mai 30 15:31:45 curie systemd[5287]: run-docker-netns-e45f5cf5f465.mount: Succeeded.
mai 30 15:31:45 curie systemd[1]: var-lib-docker-zfs-graph-41ce08fb7a1d3a9c101694b82722f5621c0b4819bd1d9f070933fd1e00543cdf.mount: Succeeded.
mai 30 15:31:45 curie systemd[5287]: var-lib-docker-zfs-graph-41ce08fb7a1d3a9c101694b82722f5621c0b4819bd1d9f070933fd1e00543cdf.mount: Succeeded.
That's double or triple the run time, from 2 seconds to 6
seconds. Most of the time is spent in run time, inside the
container. Here's the breakdown:
container setup: ~2 seconds
container run: ~4 seconds
container teardown: ~1 second
total run time: about ~6-7 seconds
That's a two- to three-fold increase! Clearly something is going on
here that I should tweak. It's possible that code path is less
optimized in Docker. I also worry about podman, but apparently it
also supports ZFS backends. Possibly it would perform better, but
at this stage I wouldn't have a good comparison: maybe it would have
performed better on non-ZFS as well...
Interactivity
While doing the offsite backups (below), the system became somewhat
"sluggish". I felt everything was slow, and I estimate it introduced
~50ms latency in any input device.
Arguably, those are all USB and the external drive was connected
through USB, but I suspect the ZFS drivers are not as well tuned with
the scheduler as the regular filesystem drivers...
Recovery procedures
For test purposes, I unmounted all systems during the procedure:
umount /mnt/boot/efi /mnt/boot/run
umount -a -t zfs
zpool export -a
And disconnected the drive, to see how I would recover this system
from another Linux system in case of a total motherboard failure.
To import an existing pool, plug the device, then import the pool with
an alternate root, so it doesn't mount over your existing filesystems,
then you mount the root filesystem and all the others:
zpool import -l -a -R /mnt &&
zfs mount rpool/ROOT/debian &&
zfs mount -a &&
mount /dev/sdc2 /mnt/boot/efi &&
mount -t tmpfs tmpfs /mnt/run &&
mkdir /mnt/run/lock
Offsite backup
Part of the goal of using ZFS is to simplify and harden backups. I
wanted to experiment with shorter recovery times specifically both
point in time recovery objective and recovery time objective
and faster incremental backups.
This is, therefore, part of my backup services.
This section documents how an external NVMe enclosure was setup in a
pool to mirror the datasets from my workstation.
The final setup should include syncoid copying datasets to the backup
server regularly, but I haven't finished that configuration yet.
Partitioning
The above partitioning procedure used sgdisk, but I couldn't figure
out how to do this with sgdisk, so this uses sfdisk to dump the
partition from the first disk to an external, identical drive:
First sync
I used syncoid to copy all pools over to the external device. syncoid
is a thing that's part of the sanoid project which is
specifically designed to sync snapshots between pool, typically over
SSH links but it can also operate locally.
The sanoid command had a --readonly argument to simulate changes,
but syncoid didn't so I tried to fix that with an upstream PR.
It seems it would be better to do this by hand, but this was much
easier. The full first sync was:
root@curie:/home/anarcat# ./bin/syncoid -r bpool bpool-tubman
CRITICAL ERROR: Target bpool-tubman exists but has no snapshots matching with bpool!
Replication to target would require destroying existing
target. Cowardly refusing to destroy your existing target.
NOTE: Target bpool-tubman dataset is < 64MB used - did you mistakenly run
zfs create bpool-tubman on the target? ZFS initial
replication must be to a NON EXISTENT DATASET, which will
then be CREATED BY the initial replication process.
INFO: Sending oldest full snapshot bpool/BOOT@test (~ 42 KB) to new target filesystem:
44.2KiB 0:00:00 [4.19MiB/s] [========================================================================================================================] 103%
INFO: Updating new target filesystem with incremental bpool/BOOT@test ... syncoid_curie_2022-05-30:12:50:39 (~ 4 KB):
2.13KiB 0:00:00 [ 114KiB/s] [===============================================================> ] 53%
INFO: Sending oldest full snapshot bpool/BOOT/debian@install (~ 126.0 MB) to new target filesystem:
126MiB 0:00:00 [ 308MiB/s] [=======================================================================================================================>] 100%
INFO: Updating new target filesystem with incremental bpool/BOOT/debian@install ... syncoid_curie_2022-05-30:12:50:39 (~ 113.4 MB):
113MiB 0:00:00 [ 315MiB/s] [=======================================================================================================================>] 100%
root@curie:/home/anarcat# ./bin/syncoid -r rpool rpool-tubman
CRITICAL ERROR: Target rpool-tubman exists but has no snapshots matching with rpool!
Replication to target would require destroying existing
target. Cowardly refusing to destroy your existing target.
NOTE: Target rpool-tubman dataset is < 64MB used - did you mistakenly run
zfs create rpool-tubman on the target? ZFS initial
replication must be to a NON EXISTENT DATASET, which will
then be CREATED BY the initial replication process.
INFO: Sending oldest full snapshot rpool/ROOT@syncoid_curie_2022-05-30:12:50:51 (~ 69 KB) to new target filesystem:
44.2KiB 0:00:00 [2.44MiB/s] [===========================================================================> ] 63%
INFO: Sending oldest full snapshot rpool/ROOT/debian@install (~ 25.9 GB) to new target filesystem:
25.9GiB 0:03:33 [ 124MiB/s] [=======================================================================================================================>] 100%
INFO: Updating new target filesystem with incremental rpool/ROOT/debian@install ... syncoid_curie_2022-05-30:12:50:52 (~ 3.9 GB):
3.92GiB 0:00:33 [ 119MiB/s] [======================================================================================================================> ] 99%
INFO: Sending oldest full snapshot rpool/home@syncoid_curie_2022-05-30:12:55:04 (~ 276.8 GB) to new target filesystem:
277GiB 0:27:13 [ 174MiB/s] [=======================================================================================================================>] 100%
INFO: Sending oldest full snapshot rpool/home/root@syncoid_curie_2022-05-30:13:22:19 (~ 2.2 GB) to new target filesystem:
2.22GiB 0:00:25 [90.2MiB/s] [=======================================================================================================================>] 100%
INFO: Sending oldest full snapshot rpool/var@syncoid_curie_2022-05-30:13:22:47 (~ 5.6 GB) to new target filesystem:
5.56GiB 0:00:32 [ 176MiB/s] [=======================================================================================================================>] 100%
INFO: Sending oldest full snapshot rpool/var/cache@syncoid_curie_2022-05-30:13:23:22 (~ 627.3 MB) to new target filesystem:
627MiB 0:00:03 [ 169MiB/s] [=======================================================================================================================>] 100%
INFO: Sending oldest full snapshot rpool/var/lib@syncoid_curie_2022-05-30:13:23:28 (~ 69 KB) to new target filesystem:
44.2KiB 0:00:00 [1.40MiB/s] [===========================================================================> ] 63%
INFO: Sending oldest full snapshot rpool/var/lib/docker@syncoid_curie_2022-05-30:13:23:28 (~ 442.6 MB) to new target filesystem:
443MiB 0:00:04 [ 103MiB/s] [=======================================================================================================================>] 100%
INFO: Sending oldest full snapshot rpool/var/lib/docker/05c0de7fabbea60500eaa495d0d82038249f6faa63b12914737c4d71520e62c5@266253254 (~ 6.3 MB) to new target filesystem:
6.49MiB 0:00:00 [12.9MiB/s] [========================================================================================================================] 102%
INFO: Updating new target filesystem with incremental rpool/var/lib/docker/05c0de7fabbea60500eaa495d0d82038249f6faa63b12914737c4d71520e62c5@266253254 ... syncoid_curie_2022-05-30:13:23:34 (~ 4 KB):
1.52KiB 0:00:00 [27.6KiB/s] [============================================> ] 38%
INFO: Sending oldest full snapshot rpool/var/lib/flatpak@syncoid_curie_2022-05-30:13:23:36 (~ 2.0 GB) to new target filesystem:
2.00GiB 0:00:17 [ 115MiB/s] [=======================================================================================================================>] 100%
INFO: Sending oldest full snapshot rpool/var/tmp@syncoid_curie_2022-05-30:13:23:55 (~ 57.0 MB) to new target filesystem:
61.8MiB 0:00:01 [45.0MiB/s] [========================================================================================================================] 108%
INFO: Clone is recreated on target rpool-tubman/var/lib/docker/ed71ddd563a779ba6fb37b3b1d0cc2c11eca9b594e77b4b234867ebcb162b205 based on rpool/var/lib/docker/05c0de7fabbea60500eaa495d0d82038249f6faa63b12914737c4d71520e62c5@266253254
INFO: Sending oldest full snapshot rpool/var/lib/docker/ed71ddd563a779ba6fb37b3b1d0cc2c11eca9b594e77b4b234867ebcb162b205@syncoid_curie_2022-05-30:13:23:58 (~ 218.6 MB) to new target filesystem:
219MiB 0:00:01 [ 151MiB/s] [=======================================================================================================================>] 100%
Funny how the CRITICAL ERROR doesn't actually stop syncoid and it
just carries on merrily doing when it's telling you it's "cowardly
refusing to destroy your existing target"... Maybe that's because my pull
request broke something though...
During the transfer, the computer was very sluggish: everything feels
like it has ~30-50ms latency extra:
anarcat@curie:sanoid$ LANG=C top -b -n 1 head -20
top - 13:07:05 up 6 days, 4:01, 1 user, load average: 16.13, 16.55, 11.83
Tasks: 606 total, 6 running, 598 sleeping, 0 stopped, 2 zombie
%Cpu(s): 18.8 us, 72.5 sy, 1.2 ni, 5.0 id, 1.2 wa, 0.0 hi, 1.2 si, 0.0 st
MiB Mem : 15898.4 total, 1387.6 free, 13170.0 used, 1340.8 buff/cache
MiB Swap: 0.0 total, 0.0 free, 0.0 used. 1319.8 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
70 root 20 0 0 0 0 S 83.3 0.0 6:12.67 kswapd0
4024878 root 20 0 282644 96432 10288 S 44.4 0.6 0:11.43 puppet
3896136 root 20 0 35328 16528 48 S 22.2 0.1 2:08.04 mbuffer
3896135 root 20 0 10328 776 168 R 16.7 0.0 1:22.93 zfs
3896138 root 20 0 10588 788 156 R 16.7 0.0 1:49.30 zfs
350 root 0 -20 0 0 0 R 11.1 0.0 1:03.53 z_rd_int
351 root 0 -20 0 0 0 S 11.1 0.0 1:04.15 z_rd_int
3896137 root 20 0 4384 352 244 R 11.1 0.0 0:44.73 pv
4034094 anarcat 30 10 20028 13960 2428 S 11.1 0.1 0:00.70 mbsync
4036539 anarcat 20 0 9604 3464 2408 R 11.1 0.0 0:00.04 top
352 root 0 -20 0 0 0 S 5.6 0.0 1:03.64 z_rd_int
353 root 0 -20 0 0 0 S 5.6 0.0 1:03.64 z_rd_int
354 root 0 -20 0 0 0 S 5.6 0.0 1:04.01 z_rd_int
I wonder how much of that is due to syncoid, particularly because I
often saw mbuffer and pv in there which are not strictly necessary
to do those kind of operations, as far as I understand.
Once that's done, export the pools to disconnect the drive:
Monitoring
ZFS should be monitoring your pools regularly. Normally, the [[!debman
zed]] daemon monitors all ZFS events. It is the thing that will report
when a scrub failed, for example. See this configuration guide.
Scrubs should be regularly scheduled to ensure consistency of the
pool. This can be done in newer zfsutils-linux versions
(bullseye-backports or bookworm) with one of those, depending on the
desired frequency:
When the scrub runs, if it finds anything it will send an event which
will get picked up by the zed daemon which will then send a
notification, see below for an example.
TODO: deploy on curie, if possible (probably not because no RAID)
TODO: this should be in Puppet
Scrub warning example
So what happens when problems are found? Here's an example of how I
dealt with an error I received.
After setting up another server (tubman) with ZFS, I
eventually ended up getting a warning from the ZFS toolchain.
Date: Sun, 09 Oct 2022 00:58:08 -0400
From: root <root@anarc.at>
To: root@anarc.at
Subject: ZFS scrub_finish event for rpool on tubman
ZFS has finished a scrub:
eid: 39536
class: scrub_finish
host: tubman
time: 2022-10-09 00:58:07-0400
pool: rpool
state: ONLINE
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-9P
scan: scrub repaired 0B in 00:33:57 with 0 errors on Sun Oct 9 00:58:07 2022
config:
NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
sdb4 ONLINE 0 1 0
sdc4 ONLINE 0 0 0
cache
sda3 ONLINE 0 0 0
errors: No known data errors
This, in itself, is a little worrisome. But it helpfully links to this
more detailed documentation (and props up there: the link still
works) which explains this is a "minor" problem (something that could
be included in the report).
In this case, this happened on a server setup on 2021-04-28, but the
disks and server hardware are much older. The server itself
(marcosv1) was built
around 2011, over 10 years ago now. The hard drive in question is:
root@tubman:~# smartctl -i -qnoserial /dev/sdb
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.10.0-15-amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Seagate BarraCuda 3.5
Device Model: ST4000DM004-2CV104
Firmware Version: 0001
User Capacity: 4,000,787,030,016 bytes [4.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 5425 rpm
Form Factor: 3.5 inches
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-3 T13/2161-D revision 5
SATA Version is: SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is: Tue Oct 11 11:02:32 2022 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
That's over a year of power on, which shouldn't be so bad. It has
written about 10TB of data (21107792664 LBAs * 512 byte/LBA), which
is about two full writes. According to its specification, this
device is supposed to support 55 TB/year of writes, so we're far below
spec. Note that are still far from the "non-recoverable read error per
bits" spec (1 per 10E15), as we've basically read 13E12 bits
(3201579750 LBAs * 512 byte/LBA = 13E12 bits).
It's likely this disk was made in 2018, so it is in its fourth
year.
Interestingly, /dev/sdc is also a Seagate drive, but of a different
series:
root@tubman:~# smartctl -qnoserial -i /dev/sdb
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.10.0-15-amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Seagate BarraCuda 3.5
Device Model: ST4000DM004-2CV104
Firmware Version: 0001
User Capacity: 4,000,787,030,016 bytes [4.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 5425 rpm
Form Factor: 3.5 inches
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-3 T13/2161-D revision 5
SATA Version is: SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is: Tue Oct 11 11:21:35 2022 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
It has seen much more reads than the other disk which is also interesting:
That's 4 years of Head_Flying_Hours, and over 4 years (4 years and
48 days) of Power_On_Hours. The copyright date on that drive's
specs goes back to 2016, so it's a much older drive.
SMART self-test succeeded.
Remaining issues
TODO: move send/receive backups to offsite host, see also
zfs for alternatives to syncoid/sanoid there
TODO: document this somewhere: bpool and rpool are both pools and
datasets. that's pretty confusing, but also very useful because it
allows for pool-wide recursive snapshots, which are used for the
backup system
fio improvements
I really want to improve my experience with fio. Right now, I'm just
cargo-culting stuff from other folks and I don't really like
it. stressant is a good example of my struggles, in the sense
that it doesn't really work that well for disk tests.
I would love to have just a single .fio job file that lists multiple
jobs to run serially. For example, this file describes the above
workload pretty well:
[global]
# cargo-culting Salter
fallocate=none
ioengine=posixaio
runtime=60
time_based=1
end_fsync=1
stonewall=1
group_reporting=1
# no need to drop caches, done by default
# invalidate=1
# Single 4KiB random read/write process
[randread-4k-4g-1x]
rw=randread
bs=4k
size=4g
numjobs=1
iodepth=1
[randwrite-4k-4g-1x]
rw=randwrite
bs=4k
size=4g
numjobs=1
iodepth=1
# 16 parallel 64KiB random read/write processes:
[randread-64k-256m-16x]
rw=randread
bs=64k
size=256m
numjobs=16
iodepth=16
[randwrite-64k-256m-16x]
rw=randwrite
bs=64k
size=256m
numjobs=16
iodepth=16
# Single 1MiB random read/write process
[randread-1m-16g-1x]
rw=randread
bs=1m
size=16g
numjobs=1
iodepth=1
[randwrite-1m-16g-1x]
rw=randwrite
bs=1m
size=16g
numjobs=1
iodepth=1
... except the jobs are actually started in parallel, even though they
are stonewall'd, as far as I can tell by the reports. I sent a
mail to the fio mailing list for clarification.
It looks like the jobs are started in parallel, but actual
(correctly) run serially. It seems like this might just be a matter of
reporting the right timestamps in the end, although it does feel like
starting all the processes (even if not doing any work yet) could
skew the results.
Hangs during procedure
During the procedure, it happened a few times where any ZFS command
would completely hang. It seems that using an external USB drive to
sync stuff didn't work so well: sometimes it would reconnect under a
different device (from sdc to sdd, for example), and this would
greatly confuse ZFS.
Here, for example, is sdd reappearing out of the blue:
May 19 11:22:53 curie kernel: [ 699.820301] scsi host4: uas
May 19 11:22:53 curie kernel: [ 699.820544] usb 2-1: authorized to connect
May 19 11:22:53 curie kernel: [ 699.922433] scsi 4:0:0:0: Direct-Access ROG ESD-S1C 0 PQ: 0 ANSI: 6
May 19 11:22:53 curie kernel: [ 699.923235] sd 4:0:0:0: Attached scsi generic sg2 type 0
May 19 11:22:53 curie kernel: [ 699.923676] sd 4:0:0:0: [sdd] 1953525168 512-byte logical blocks: (1.00 TB/932 GiB)
May 19 11:22:53 curie kernel: [ 699.923788] sd 4:0:0:0: [sdd] Write Protect is off
May 19 11:22:53 curie kernel: [ 699.923949] sd 4:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
May 19 11:22:53 curie kernel: [ 699.924149] sd 4:0:0:0: [sdd] Optimal transfer size 33553920 bytes
May 19 11:22:53 curie kernel: [ 699.961602] sdd: sdd1 sdd2 sdd3 sdd4
May 19 11:22:53 curie kernel: [ 699.996083] sd 4:0:0:0: [sdd] Attached SCSI disk
Next time I run a ZFS command (say zpool list), the command
completely hangs (D state) and this comes up in the logs:
May 19 11:34:21 curie kernel: [ 1387.914843] zio pool=bpool vdev=/dev/sdc3 error=5 type=2 offset=71344128 size=4096 flags=184880
May 19 11:34:21 curie kernel: [ 1387.914859] zio pool=bpool vdev=/dev/sdc3 error=5 type=2 offset=205565952 size=4096 flags=184880
May 19 11:34:21 curie kernel: [ 1387.914874] zio pool=bpool vdev=/dev/sdc3 error=5 type=2 offset=272789504 size=4096 flags=184880
May 19 11:34:21 curie kernel: [ 1387.914906] zio pool=bpool vdev=/dev/sdc3 error=5 type=1 offset=270336 size=8192 flags=b08c1
May 19 11:34:21 curie kernel: [ 1387.914932] zio pool=bpool vdev=/dev/sdc3 error=5 type=1 offset=1073225728 size=8192 flags=b08c1
May 19 11:34:21 curie kernel: [ 1387.914948] zio pool=bpool vdev=/dev/sdc3 error=5 type=1 offset=1073487872 size=8192 flags=b08c1
May 19 11:34:21 curie kernel: [ 1387.915165] zio pool=bpool vdev=/dev/sdc3 error=5 type=2 offset=272793600 size=4096 flags=184880
May 19 11:34:21 curie kernel: [ 1387.915183] zio pool=bpool vdev=/dev/sdc3 error=5 type=2 offset=339853312 size=4096 flags=184880
May 19 11:34:21 curie kernel: [ 1387.915648] WARNING: Pool 'bpool' has encountered an uncorrectable I/O failure and has been suspended.
May 19 11:34:21 curie kernel: [ 1387.915648]
May 19 11:37:25 curie kernel: [ 1571.558614] task:txg_sync state:D stack: 0 pid: 997 ppid: 2 flags:0x00004000
May 19 11:37:25 curie kernel: [ 1571.558623] Call Trace:
May 19 11:37:25 curie kernel: [ 1571.558640] __schedule+0x282/0x870
May 19 11:37:25 curie kernel: [ 1571.558650] schedule+0x46/0xb0
May 19 11:37:25 curie kernel: [ 1571.558670] schedule_timeout+0x8b/0x140
May 19 11:37:25 curie kernel: [ 1571.558675] ? __next_timer_interrupt+0x110/0x110
May 19 11:37:25 curie kernel: [ 1571.558678] io_schedule_timeout+0x4c/0x80
May 19 11:37:25 curie kernel: [ 1571.558689] __cv_timedwait_common+0x12b/0x160 [spl]
May 19 11:37:25 curie kernel: [ 1571.558694] ? add_wait_queue_exclusive+0x70/0x70
May 19 11:37:25 curie kernel: [ 1571.558702] __cv_timedwait_io+0x15/0x20 [spl]
May 19 11:37:25 curie kernel: [ 1571.558816] zio_wait+0x129/0x2b0 [zfs]
May 19 11:37:25 curie kernel: [ 1571.558929] dsl_pool_sync+0x461/0x4f0 [zfs]
May 19 11:37:25 curie kernel: [ 1571.559032] spa_sync+0x575/0xfa0 [zfs]
May 19 11:37:25 curie kernel: [ 1571.559138] ? spa_txg_history_init_io+0x101/0x110 [zfs]
May 19 11:37:25 curie kernel: [ 1571.559245] txg_sync_thread+0x2e0/0x4a0 [zfs]
May 19 11:37:25 curie kernel: [ 1571.559354] ? txg_fini+0x240/0x240 [zfs]
May 19 11:37:25 curie kernel: [ 1571.559366] thread_generic_wrapper+0x6f/0x80 [spl]
May 19 11:37:25 curie kernel: [ 1571.559376] ? __thread_exit+0x20/0x20 [spl]
May 19 11:37:25 curie kernel: [ 1571.559379] kthread+0x11b/0x140
May 19 11:37:25 curie kernel: [ 1571.559382] ? __kthread_bind_mask+0x60/0x60
May 19 11:37:25 curie kernel: [ 1571.559386] ret_from_fork+0x22/0x30
May 19 11:37:25 curie kernel: [ 1571.559401] task:zed state:D stack: 0 pid: 1564 ppid: 1 flags:0x00000000
May 19 11:37:25 curie kernel: [ 1571.559404] Call Trace:
May 19 11:37:25 curie kernel: [ 1571.559409] __schedule+0x282/0x870
May 19 11:37:25 curie kernel: [ 1571.559412] ? __kmalloc_node+0x141/0x2b0
May 19 11:37:25 curie kernel: [ 1571.559417] schedule+0x46/0xb0
May 19 11:37:25 curie kernel: [ 1571.559420] schedule_preempt_disabled+0xa/0x10
May 19 11:37:25 curie kernel: [ 1571.559424] __mutex_lock.constprop.0+0x133/0x460
May 19 11:37:25 curie kernel: [ 1571.559435] ? nvlist_xalloc.part.0+0x68/0xc0 [znvpair]
May 19 11:37:25 curie kernel: [ 1571.559537] spa_all_configs+0x41/0x120 [zfs]
May 19 11:37:25 curie kernel: [ 1571.559644] zfs_ioc_pool_configs+0x17/0x70 [zfs]
May 19 11:37:25 curie kernel: [ 1571.559752] zfsdev_ioctl_common+0x697/0x870 [zfs]
May 19 11:37:25 curie kernel: [ 1571.559758] ? _copy_from_user+0x28/0x60
May 19 11:37:25 curie kernel: [ 1571.559860] zfsdev_ioctl+0x53/0xe0 [zfs]
May 19 11:37:25 curie kernel: [ 1571.559866] __x64_sys_ioctl+0x83/0xb0
May 19 11:37:25 curie kernel: [ 1571.559869] do_syscall_64+0x33/0x80
May 19 11:37:25 curie kernel: [ 1571.559873] entry_SYSCALL_64_after_hwframe+0x44/0xa9
May 19 11:37:25 curie kernel: [ 1571.559876] RIP: 0033:0x7fcf0ef32cc7
May 19 11:37:25 curie kernel: [ 1571.559878] RSP: 002b:00007fcf0e181618 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
May 19 11:37:25 curie kernel: [ 1571.559881] RAX: ffffffffffffffda RBX: 000055b212f972a0 RCX: 00007fcf0ef32cc7
May 19 11:37:25 curie kernel: [ 1571.559883] RDX: 00007fcf0e181640 RSI: 0000000000005a04 RDI: 000000000000000b
May 19 11:37:25 curie kernel: [ 1571.559885] RBP: 00007fcf0e184c30 R08: 00007fcf08016810 R09: 00007fcf08000080
May 19 11:37:25 curie kernel: [ 1571.559886] R10: 0000000000080000 R11: 0000000000000246 R12: 000055b212f972a0
May 19 11:37:25 curie kernel: [ 1571.559888] R13: 0000000000000000 R14: 00007fcf0e181640 R15: 0000000000000000
May 19 11:37:25 curie kernel: [ 1571.559980] task:zpool state:D stack: 0 pid:11815 ppid: 3816 flags:0x00004000
May 19 11:37:25 curie kernel: [ 1571.559983] Call Trace:
May 19 11:37:25 curie kernel: [ 1571.559988] __schedule+0x282/0x870
May 19 11:37:25 curie kernel: [ 1571.559992] schedule+0x46/0xb0
May 19 11:37:25 curie kernel: [ 1571.559995] io_schedule+0x42/0x70
May 19 11:37:25 curie kernel: [ 1571.560004] cv_wait_common+0xac/0x130 [spl]
May 19 11:37:25 curie kernel: [ 1571.560008] ? add_wait_queue_exclusive+0x70/0x70
May 19 11:37:25 curie kernel: [ 1571.560118] txg_wait_synced_impl+0xc9/0x110 [zfs]
May 19 11:37:25 curie kernel: [ 1571.560223] txg_wait_synced+0xc/0x40 [zfs]
May 19 11:37:25 curie kernel: [ 1571.560325] spa_export_common+0x4cd/0x590 [zfs]
May 19 11:37:25 curie kernel: [ 1571.560430] ? zfs_log_history+0x9c/0xf0 [zfs]
May 19 11:37:25 curie kernel: [ 1571.560537] zfsdev_ioctl_common+0x697/0x870 [zfs]
May 19 11:37:25 curie kernel: [ 1571.560543] ? _copy_from_user+0x28/0x60
May 19 11:37:25 curie kernel: [ 1571.560644] zfsdev_ioctl+0x53/0xe0 [zfs]
May 19 11:37:25 curie kernel: [ 1571.560649] __x64_sys_ioctl+0x83/0xb0
May 19 11:37:25 curie kernel: [ 1571.560653] do_syscall_64+0x33/0x80
May 19 11:37:25 curie kernel: [ 1571.560656] entry_SYSCALL_64_after_hwframe+0x44/0xa9
May 19 11:37:25 curie kernel: [ 1571.560659] RIP: 0033:0x7fdc23be2cc7
May 19 11:37:25 curie kernel: [ 1571.560661] RSP: 002b:00007ffc8c792478 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
May 19 11:37:25 curie kernel: [ 1571.560664] RAX: ffffffffffffffda RBX: 000055942ca49e20 RCX: 00007fdc23be2cc7
May 19 11:37:25 curie kernel: [ 1571.560666] RDX: 00007ffc8c792490 RSI: 0000000000005a03 RDI: 0000000000000003
May 19 11:37:25 curie kernel: [ 1571.560667] RBP: 00007ffc8c795e80 R08: 00000000ffffffff R09: 00007ffc8c792310
May 19 11:37:25 curie kernel: [ 1571.560669] R10: 000055942ca49e30 R11: 0000000000000246 R12: 00007ffc8c792490
May 19 11:37:25 curie kernel: [ 1571.560671] R13: 000055942ca49e30 R14: 000055942aed2c20 R15: 00007ffc8c795a40
Here's another example, where you see the USB controller bleeping out
and back into existence:
mai 19 11:38:39 curie kernel: usb 2-1: USB disconnect, device number 2
mai 19 11:38:39 curie kernel: sd 4:0:0:0: [sdd] Synchronizing SCSI cache
mai 19 11:38:39 curie kernel: sd 4:0:0:0: [sdd] Synchronize Cache(10) failed: Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK
mai 19 11:39:25 curie kernel: INFO: task zed:1564 blocked for more than 241 seconds.
mai 19 11:39:25 curie kernel: Tainted: P IOE 5.10.0-14-amd64 #1 Debian 5.10.113-1
mai 19 11:39:25 curie kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
mai 19 11:39:25 curie kernel: task:zed state:D stack: 0 pid: 1564 ppid: 1 flags:0x00000000
mai 19 11:39:25 curie kernel: Call Trace:
mai 19 11:39:25 curie kernel: __schedule+0x282/0x870
mai 19 11:39:25 curie kernel: ? __kmalloc_node+0x141/0x2b0
mai 19 11:39:25 curie kernel: schedule+0x46/0xb0
mai 19 11:39:25 curie kernel: schedule_preempt_disabled+0xa/0x10
mai 19 11:39:25 curie kernel: __mutex_lock.constprop.0+0x133/0x460
mai 19 11:39:25 curie kernel: ? nvlist_xalloc.part.0+0x68/0xc0 [znvpair]
mai 19 11:39:25 curie kernel: spa_all_configs+0x41/0x120 [zfs]
mai 19 11:39:25 curie kernel: zfs_ioc_pool_configs+0x17/0x70 [zfs]
mai 19 11:39:25 curie kernel: zfsdev_ioctl_common+0x697/0x870 [zfs]
mai 19 11:39:25 curie kernel: ? _copy_from_user+0x28/0x60
mai 19 11:39:25 curie kernel: zfsdev_ioctl+0x53/0xe0 [zfs]
mai 19 11:39:25 curie kernel: __x64_sys_ioctl+0x83/0xb0
mai 19 11:39:25 curie kernel: do_syscall_64+0x33/0x80
mai 19 11:39:25 curie kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
mai 19 11:39:25 curie kernel: RIP: 0033:0x7fcf0ef32cc7
mai 19 11:39:25 curie kernel: RSP: 002b:00007fcf0e181618 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
mai 19 11:39:25 curie kernel: RAX: ffffffffffffffda RBX: 000055b212f972a0 RCX: 00007fcf0ef32cc7
mai 19 11:39:25 curie kernel: RDX: 00007fcf0e181640 RSI: 0000000000005a04 RDI: 000000000000000b
mai 19 11:39:25 curie kernel: RBP: 00007fcf0e184c30 R08: 00007fcf08016810 R09: 00007fcf08000080
mai 19 11:39:25 curie kernel: R10: 0000000000080000 R11: 0000000000000246 R12: 000055b212f972a0
mai 19 11:39:25 curie kernel: R13: 0000000000000000 R14: 00007fcf0e181640 R15: 0000000000000000
mai 19 11:39:25 curie kernel: INFO: task zpool:11815 blocked for more than 241 seconds.
mai 19 11:39:25 curie kernel: Tainted: P IOE 5.10.0-14-amd64 #1 Debian 5.10.113-1
mai 19 11:39:25 curie kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
mai 19 11:39:25 curie kernel: task:zpool state:D stack: 0 pid:11815 ppid: 2621 flags:0x00004004
mai 19 11:39:25 curie kernel: Call Trace:
mai 19 11:39:25 curie kernel: __schedule+0x282/0x870
mai 19 11:39:25 curie kernel: schedule+0x46/0xb0
mai 19 11:39:25 curie kernel: io_schedule+0x42/0x70
mai 19 11:39:25 curie kernel: cv_wait_common+0xac/0x130 [spl]
mai 19 11:39:25 curie kernel: ? add_wait_queue_exclusive+0x70/0x70
mai 19 11:39:25 curie kernel: txg_wait_synced_impl+0xc9/0x110 [zfs]
mai 19 11:39:25 curie kernel: txg_wait_synced+0xc/0x40 [zfs]
mai 19 11:39:25 curie kernel: spa_export_common+0x4cd/0x590 [zfs]
mai 19 11:39:25 curie kernel: ? zfs_log_history+0x9c/0xf0 [zfs]
mai 19 11:39:25 curie kernel: zfsdev_ioctl_common+0x697/0x870 [zfs]
mai 19 11:39:25 curie kernel: ? _copy_from_user+0x28/0x60
mai 19 11:39:25 curie kernel: zfsdev_ioctl+0x53/0xe0 [zfs]
mai 19 11:39:25 curie kernel: __x64_sys_ioctl+0x83/0xb0
mai 19 11:39:25 curie kernel: do_syscall_64+0x33/0x80
mai 19 11:39:25 curie kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
mai 19 11:39:25 curie kernel: RIP: 0033:0x7fdc23be2cc7
mai 19 11:39:25 curie kernel: RSP: 002b:00007ffc8c792478 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
mai 19 11:39:25 curie kernel: RAX: ffffffffffffffda RBX: 000055942ca49e20 RCX: 00007fdc23be2cc7
mai 19 11:39:25 curie kernel: RDX: 00007ffc8c792490 RSI: 0000000000005a03 RDI: 0000000000000003
mai 19 11:39:25 curie kernel: RBP: 00007ffc8c795e80 R08: 00000000ffffffff R09: 00007ffc8c792310
mai 19 11:39:25 curie kernel: R10: 000055942ca49e30 R11: 0000000000000246 R12: 00007ffc8c792490
mai 19 11:39:25 curie kernel: R13: 000055942ca49e30 R14: 000055942aed2c20 R15: 00007ffc8c795a40
I understand those are rather extreme conditions: I would fully expect
the pool to stop working if the underlying drives disappear. What
doesn't seem acceptable is that a command would completely hang like
this.
References
See the zfs documentation for more information about ZFS,
and tubman for another installation and migration procedure.